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Preface 


We live in a highly connected world, with multiple self-interested agents inter- 
acting, leading to myriad opportunities for conflict and cooperation. Understanding 
these is the goal of game theory. It finds application in fields such as economics, 
business, political science, biology, psychology, sociology, computer science, and en- 
gineering. Conversely, ideas from the social sciences (e.g., fairness), from biology 
(evolutionary stability), from statistics (adaptive learning), and from computer sci- 
ence (complexity of finding equilibria) have greatly enriched game theory. In this 
book, we present an introduction to this field. We will see applications from a vari- 
ety of disciplines and delve into some of the fascinating mathematics that underlies 
game theory. 


An overview of the book 


Part I: Analyzing games: Strategies and equilibria. We begin in|Chap- 
[ter 1] with combinatorial games, in which two players take turns making moves 
until a winning position for one of them is reached. 


FIGURE 1. Two people playing Nim. 


A classic example of a combinatorial game is Nim. In this game, there are 
several piles of chips, and players take turns removing one or more chips from a 
single pile. The player who takes the last chip wins. We will describe a winning 
strategy for Nim and show that a large class of combinatorial games can be reduced 
to it. 

Other well-known combinatorial games are Chess, Go, and Hex. The youngest 
of these is Hex, which was invented by Piet Hein in 1942 and independently by John 
Nash in 1947. Hex is played on a rhombus-shaped board tiled with small hexagons 
(see|Figure 2). Two players, Blue and Yellow, alternate coloring in hexagons in their 
assigned color, blue or yellow, one hexagon per turn. Blue wins if she produces 


1 
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FIGURE 2. The board for the game of Hex. 


a blue chain crossing between her two sides of the board and Yellow wins if he 
produces a yellow chain connecting the other two sides. 

We will show that the player who moves first has a winning strategy; finding 
this strategy remains an unsolved problem, except when the board is small. 


FIGURE 3. The board position near the end of the match between Queenbee 
and Hexy at the 5th Computer Olympiad. Each hexagon is labeled by the 
time at which it was placed on the board. Blue moves next, but Yellow has a 
winning strategy. Can you see why? 


In an interesting variant of the game, the players, instead of alternating turns, 
toss a coin to determine who moves next. In this case, we can describe optimal 
strategies for the players. Such random-turn combinatorial games are the 
subject. of 

In Chapters we consider games in which the players simultaneously 
select from a set of possible actions. Their selections are then revealed, resulting 
in a payoff to each player. For two players, these payoffs are represented using the 
matrices A = (aij) and B = (bij). When player I selects action i and player II 
selects action j, the payoffs to these players are aij and bij, respectively. Two- 
person games where one player’s gain is the other player’s loss, that is, aj; + bi; = 0 
for all į, j, are called zero-sum games. Such games are the topic of 
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We show that every zero-sum game has a value V such that player I can ensure 
her expected payoff is at least V (no matter how II plays) and player II can ensure 
he pays I at most V (in expectation) no matter how I plays. 

For example, in Penalty Kicks, a zero-sum game inspired by soccer, one 
player, the kicker, chooses to kick the ball either to the left or to the right of the 
other player, the goalie. At the same instant as the kick, the goalie guesses whether 
to dive left or right. 


FIGURE 4. The game of Penalty Kicks. 


The goalie has a chance of saving the goal if he dives in the same direction as the 
kick. The kicker, who we assume is right-footed, has a greater likelihood of success 
if she kicks right. The probabilities that the penalty kick scores are displayed in 
the table below: 


goalie 

L R 
xi LIOS 1 
S|R| 1 08 
eá 


For this set of scoring probabilities, the optimal strategy for the kicker is to kick left 
with probability 2/7 and kick right with probability 5/7 — then regardless of what 
the goalie does, the probability of scoring is 6/7. Similarly, the optimal strategy for 
the goalie is to dive left with probability 2/7 and dive right with probability 5/7. 

goes on to analyze a number of interesting zero-sum games on 
graphs. For example, we consider a game between a Troll and a Traveler. Each 
of them chooses a route (a sequence of roads) from Syracuse to Troy, and then they 
simultaneously disclose their routes. Each road has an associated toll. For each 
road chosen by both players, the traveler pays the toll to the troll. We find optimal 
strategies by developing a connection with electrical networks. 

In we turn to general-sum games. In these games, players 
no longer have optimal strategies. Instead, we focus on situations where each 
player’s strategy is a best response to the strategies of the opponents: a Nash 
equilibrium is an assignment of (possibly randomized) strategies to the players, 
with the property that no player can gain by unilaterally changing his strategy. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


4 PREFACE 


It turns out that every general-sum game has at least one Nash equilibrium. The 
proof of this fact requires an important geometric tool, the Brouwer fixed-point 


theorem, which is covered in|Chapter 5 


FIGURE 5. Prisoner’s Dilemma: the prisoners considering the possible conse- 
quences of confessing or remaining silent. 


The most famous general-sum game is the Prisoner’s Dilemma. If one pris- 
oner confesses and the other remains silent, then the first goes free and the second 
receives a ten-year sentence. They will be sentenced to eight years each if they both 
confess and to one year each if they both remain silent. The only equilibrium in 
this game is for both to confess, but the game becomes more interesting when it is 
repeated, as we discuss in [Chapter 6] More generally, in [Chapter 6] we consider 
games where players alternate moves as in [Chapter I] but the payoffs are general 
as in [Chapter 4] These are called extensive-form games. Often these games in- 
volve imperfect information, where players do not know all actions that have 
been taken by their opponents. For instance, in the 1962 Cuban Missile Crisis, the 
U.S. did not know whether the U.S.S.R. had installed nuclear missiles in Cuba and 
had to decide whether to bomb the missile sites in Cuba without knowing whether 
or not they were fitted with nuclear warheads. (The U.S. used a naval blockade 
instead.) We also consider games of incomplete information where the players 
do not even know exactly what game they are playing. For instance, in poker, the 
potential payoffs to a player depend on the cards dealt to his opponents. 

One criticism of optimal strategies and equilibria in game theory is that finding 
them requires hyperrational players that can analyze complicated strategies. How- 
ever, it was observed that populations of termites, spiders, and lizards can arrive 
at a Nash equilibrium just via natural selection. The equilibria that arise in such 
populations have an additional property called evolutionary stability, which is 
discussed in 

In the same chapter, we also introduce correlated equilibria. When two 
drivers approach an intersection, there is no good Nash equilibrium. For example, 
the convention of yielding to a driver on your right is problematic in a four-way 
intersection. A traffic light serves as a correlating device that ensures each driver is 
incentivized to follow the indications of the light. Correlated equilibria generalize 
this idea. 
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In we compare outcomes in Nash equilibrium to outcomes that 
could be achieved by a central planner optimizing a global objective function. For 
example, in Prisoner’s Dilemma, the total loss (combined jail time) in the unique 
Nash equilibrium is 16 years; the minimum total loss is 2 years (if both stay silent). 
Thus, the ratio, known as the price of anarchy of the game, is 8. Another example 
compares the average driving time in a road network when the drivers are selfish 
(i.e., in a Nash equilibrium) to the average driving time in an optimal routing. 


FIGURE 6. An unstable pair. 


Part II: Designing games and mechanisms. So far, we have considered 
predefined games, and our goal was to understand the outcomes that we can expect 
from rational players. In the second part of the book, we also consider mechanism 
design where we start with desired properties of the outcome (e.g., high profit or 
fairness) and attempt to design a game (or market or scheme) that incentivizes 
players to reach an outcome that meets our goals. Applications of mechanism 
design include voting systems, auctions, school choice, environmental regulation, 
and organ donation. 

For example, suppose that there are n men and n women, where each man has 
a preference ordering of the women and vice versa. A matching between them is 
stable if there is no unstable pair, i.e., a man and woman who prefer each other 
to their partners in the matching. In[Chapter 10] we introduce the Gale-Shapley 
algorithm for finding a stable matching. A generalization of stable matching is 
used by the National Resident Matching Program, which matches about 20,000 
new doctors to residency programs at hospitals every year. 

[Chapter 11] considers the design of mechanisms for fair division. Consider 
the problem of dividing a cake with several different toppings among several peo- 
ple. Each topping is distributed over some portion of the cake, and each person 
prefers some toppings to others. If there are just two people, there is a well-known 
mechanism for dividing the cake: One cuts it in two, and the other chooses which 
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piece to take. Under this system, each person is at least as happy with what he 
receives as he would be with the other person’s share. What if there are three or 
more people? We also consider a 2000-year-old problem: how to divide an estate 
between several creditors whose claims exceed the value of the estate. 

The topic of[Chapter 12lis cooperative game theory, in which players form 
coalitions in order to maximize their utility. As an example, suppose that three 
people have gloves to sell. Two are each selling a single, left-handed glove, while the 
third is selling a right-handed one. A wealthy tourist enters the store in dire need of 
a pair of gloves. She refuses to deal with the glove-bearers individually, so at least 
two of them must form a coalition to sell a left-handed and a right-handed glove to 
her. The third player has an advantage because his commodity is in scarcer supply. 
Thus, he should receive a higher fraction of the price the tourist pays. However, if 
he holds out for too high a fraction of the payment, the other players may agree 
between themselves that he must pay both of them in order to obtain a left glove. 
A related topic discussed in the chapter is bargaining, where the classical solution 
is again due to Nash. 


FIGURE 7. Voting in Florida during the 2000 U.S. presidential election. 


In|Chapter 13}we turn to social choice: designing mechanisms that aggregate 
the preferences of a collection of individuals. The most basic example is the design 


of voting schemes. We prove Arrow’s Impossibility Theorem, which implies that 
all voting systems are strategically vulnerable. However, some systems are better 
than others. For example, the widely used system of runoff elections is not even 
monotone; i.e., transferring votes from one candidate to another might lead the 
second candidate to lose an election he would otherwise win. In contrast, Borda 
count and approval voting are monotone and more resistant to manipulation. 
[Chapter 14]studies auctions for a single item. We compare different auction 
formats such as first-price (selling the item to the highest bidder at a price equal 
to his bid) and second-price (selling the item to the highest bidder at a price 
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FIGURE 8. An auction for a painting. 


equal to the second highest bid). In first-price auctions, bidders must bid below 
their value if they are to make any profit; in contrast, in a second-price auction, it is 
optimal for bidders to simply bid their value. Nevertheless, the Revenue Equiva- 
lence Theorem shows that, in equilibrium, if the bidders’ values are independent 
and identically distributed, then the expected auctioneer revenue in the first-price 
and second-price auctions is the same. We also show how to design optimal (i.e., 
revenue-maximizing) auctions under the assumption that the auctioneer has good 
prior information about the bidders’ values for the item he is selling. 

Chapters [15] and [16]discuss truthful mechanisms that go beyond the second- 
price auction, in particular, the Vickrey-Clarke-Groves (VCG) mechanism for 
maximizing social surplus, the total utility of all participants in the mechanism. A 
key application is to sponsored search auctions, the auctions that search engines 
like Google and Bing run every time you perform a search. In these auctions, the 
bidders are companies who wish to place their advertisements in one of the slots 
you see when you get the results of your search. In [Chapter 16] we also discuss 
scoring rules. For instance, how can we incentivize a meteorologist to give the 
most accurate prediction he can? 

[Chapter 17| considers matching markets. A certain housing market has n 
homeowners and n potential buyers. Buyer 7 has a value v;; for house j. The goal 
is to find an allocation of houses to buyers and corresponding prices that are stable; 
i.e., there is no pair of buyer and homeowner that can strike a better deal. A related 
problem is allocating rooms to renters in a shared rental house. See[Figure 9} 

Finally, concerns adaptive decision making. Suppose that 
each day several experts suggest actions for you to take; each possible action has a 
reward (or penalty) that varies between days and is revealed only after you choose. 
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FIGURE 9. Three roommates need to decide who will get each room, and how 
much of the rent each person will pay. 


Surprisingly, there is an algorithm that ensures your average reward over many days 
(almost) matches that of the best expert. If two players in a repeated zero-sum 
game employ such an algorithm, the empirical distribution of play for each of them 
will converge to an optimal strategy. 


For the reader and instructor 


Prerequisites. Readers should have taken basic courses in probability and 
linear algebra. Starred sections and subsections are more difficult; some require 
familiarity with mathematical analysis that can be acquired, e.g., in [Rud76]. 


Courses. This book can be used for different kinds of courses. For instance, an 
undergraduate game theory course could include Chapter 1 (combinatorial games), 
Chapter 2 and most of Chapter 3 on zero-sum games, Chapters 4 and 7 on general- 
sum games and different types of equilibria, Chapter 10 (stable matching), parts 
of Chapters 11 (fair division), 13 (social choice) and possibly 12 (especially the 
Shapley value). Indeed, this book started from lecture notes to such a course that 
was given at Berkeley for several years by the second author. 

A course for computer science students might skip some of the above chapters 
(e.g., combinatorial games), and instead emphasize Chapter 9 on price of anarchy, 
Chapters 14-16 on auctions and VCG, and possibly parts of Chapters 17 (matching 
markets) and 18 (adaptive decision making). The topic of stable matching (Chapter 
10) is a gem that requires no background and could fit in any course. The logical 
dependencies between the chapters are shown in 

There are solution outlines to some problems in Such solutions 
are labeled with an “S” in the text. More difficult problems are labeled with a *. 
Additional exercises and material can be found at: 


http://homes.cs.washington.edu/~karlin/GameTheoryAlive 
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FIGURE 10. Chapter dependencies 


Notes 


There are many excellent books on game theory. In particular, in writing this book, 
we consulted Ferguson [Fer08], Gintis [Gin00], González-Díaz et al. [GDGJFJ10a], Luce 
and Raiffa [LR57], Maschler, Solan, and Zamir [MSZ13], Osborne and Rubinstein [OR94], 
Owen [Owe95], the survey book on algorithmic game theory [Nis07], and the handbooks 
of game theory, Volumes 1-4 (see, e.g., [AH92]). 

The entries in the payoff matrices for zero-sum games represent the utility of the 
players, and throughout the book we assume that the goal of each agent is maximizing 
his expected utility. Justifying this assumption is the domain of utility theory, which is 
discussed in most game theory books. 
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The Penalty Kicks matrix we gave was idealized for simplicity. Actual data on 1,417 
penalty kicks from professional games in Europe was collected and analyzed by Palacios- 
Huerta |PH03). The resulting matrix is 


goalie 
L R 
S| L/ 0.58 0.95 
5 
Z| R | 0.93 0.70 


Here ‘R’ represents the dominant (natural) side for the kicker. Given these probabilities, 
the optimal strategy for the kicker is (0.38, 0.62) and the optimal strategy for the goalie 
is (0.42, 0.58). The observed frequencies were (0.40, 0.60) for the kicker and (0.423, 0.577) 


for the goalie. 
The early history of the theory of strategic games from Waldegrave to Borel is dis- 


cussed in [DD92]. 
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CHAPTER 1 


Combinatorial games 


In a combinatorial game, there are two players, a set of positions, and a set of 
legal moves between positions. The players take turns moving from one position 
to another. Some of the positions are terminal. Each terminal position is labelled 
as winning for either player I or player II. We will concentrate on combinatorial 
games that terminate in a finite number of steps. 


EXAMPLE 1.0.1 (Chomp). In Chomp, two players take turns biting off a chunk 
of a rectangular bar of chocolate that is divided into squares. The bottom left corner 
of the bar has been removed and replaced with a broccoli floret. Each player, in 
his turn, chooses an uneaten chocolate square and removes it along with all the 
squares that lie above and to the right of it. The person who bites off the last piece 
of chocolate wins and the loser has to eat the broccoli (i.e., the terminal position 
is when all the chocolate is gone.) See We will return to Chomp in 


FIGURE 1.1. Two moves in a game of Chomp. 


DEFINITION 1.0.2. A combinatorial game with a position set X is said to be 
progressively bounded if, for every starting position x € X, there is a finite 
bound on the number of moves until the game terminates. Let B(x) be the maxi- 
mum number of moves from x to a terminal position. 


Combinatorial games generally fall into two categories: Those for which the 
winning positions and the available moves are the same for both players (e.g., Nim), 
are called impartial. The player who first reaches one of the terminal positions 
wins the game. All other games are called partisan. In such games (e.g., Hex), 
either the players have different sets of winning positions, or from some position 
their available moves differ [] 


l In addition, some partisan games (e.g., Chess) may terminate in a draw (or tie), but we 
will not consider those here. 


12 
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For a given combinatorial game, our goal will be to find out whether one of 
the players can always force a win and, if so, to determine the winning strategy — 
the moves this player should make under every contingency. We will show that, in 
a progressively bounded combinatorial game with no ties, one of the players has a 
winning strategy. 


1.1. Impartial games 


EXAMPLE 1.1.1 (Subtraction). Starting with a pile of x € N chips, two players 
alternate taking 1 to 4 chips. The player who removes the last chip wins. 


Observe that starting from any x € N, this game is progressively bounded with 
B(x) = x. If the game starts with 4 or fewer chips, the first player has a winning 
move: she just removes them all. If there are 5 chips to start with, however, the 
second player will be left with between 1 and 4 chips, regardless of what the first 
player does. 

What about 6 chips? This is again a winning position for the first player 
because if he removes 1 chip, the second player is left in the losing position of 5 
chips. The same is true for 7, 8, or 9 chips. With 10 chips, however, the second 
player again can guarantee that he will win. 

Define: 

N= fz EN: the first (“next”) player can ensure a win \ 
if there are x chips at the start i 

P= fz cN: the second (“previous”) player can ensure a a 
if there are x chips at the start ` 


So far, we have seen that {1, 2,3, 4,6,7,8,9} C N and {0,5} C P. Continuing this 
line of reasoning, we find that P = {x € N : z is divisible by 5} and N = N \ P. 

The approach that we used to analyze the Subtraction game can be extended 
to other impartial games. 


DEFINITION 1.1.2. An impartial combinatorial game has two players and a 
set of possible positions. To make a move is to take the game from one position to 
another. More formally, a move is an ordered pair of positions. A terminal position 
is one from which there are no legal moves. For every nonterminal position, there 
is a set of legal moves, the same for both players. Under normal play, the player 
who moves to a terminal position wins. 


We can think of the game positions as nodes and the moves as directed links. 
Such a collection of nodes (vertices) and links (edges) between them is called a 
(directed) graph. At the start of the game, a token is placed at the node corre- 
sponding to the initial position. Subsequently, players take turns moving the token 
along directed edges until one of them reaches a terminal node and is declared the 
winner. 

With this definition, it is clear that the Subtraction Game is an impartial game 
under normal play. The only terminal position is x = 0. Figure[1.2] gives a directed 
graph corresponding to the Subtraction Game with initial position x = 14. 

We saw that starting from a position x € N, the next player to move can force 
a win by moving to one of the elements in P = {5n : n € N}, namely 5|z/5]. 


DEFINITION 1.1.3. A strategy for a player is a function that assigns a legal 
move to each nonterminal position. A winning strategy from a position x is a 
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LEN KENAN 


FIGURE 1.2. Moves in the Subtraction Game. Positions in N are marked in 
red and those in P are marked in black. 


strategy that, starting from x, is guaranteed to result in a win for that player in a 
finite number of steps. 


We can extend the notions of N and P to any impartial game. 


DEFINITION 1.1.4. For any impartial combinatorial game, define N (for “next” ) 
to be the set of positions such that the first player to move can guarantee a win. 
The set of positions for which every move leads to an N-position is denoted by P 
(for “previous” ), since the player who can force a P-position can guarantee a win. 


Let N; (respectively, P;) be the set of positions from which the next player 
(respectively, the previous player) can guarantee a win within at most 7 moves (of 
either player). Note that Po C Pı C Po C--- and Ni C No C---. Clearly 


N=JNi P=(JPi. 
i>1 i>0 
The sets N; and P; can be determined recursively: 
P, := Po := {terminal positions } , 
N;+1 := { positions x for which there is a move leading to P; }, 
Pj41 := { positions y such that each move leads to N; }. 


In the Subtraction Game, we have 


Pı = Po = {0}, 
No = N: = {1, 2,3, 4}, P; = P2 = {0,5}, 
Na = N3 _ {1, 2,3,4,6, 7,8,9}, P; = P4 = {0,5, 10}, 
N=N\5N. P=S5N. 


THEOREM 1.1.5. In a progressively bounded impartial combinatorial game un- 
der normal play? all positions lie in NUP. Thus, from any initial position, one 
of the players has a winning strategy. 


PROOF. Recall that B(x) is the maximum number of moves from x to a ter- 
minal position. We prove by induction on n, that all positions x with B(x) < n are 
in Np U Ph- 


2 Recall that normal play means that the player who moves to a terminal position wins. 
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Certainly, for all x such that B(x) = 0, we have that x € Po C P. Now 
consider any position z for which B(z) = n + 1. Then every move from z leads to 
a position w with B(w) < n. There are two cases: 

Case 1: Each move from z leads to a position in Np. Then z € P,,4+1. 

Case 2: There is a move from z to a position w ¢ Np. Since B(w) < n, the 
inductive hypothesis implies that w € Ph. Thus, z E€ Ny41. 

Hence, all positions lie in NUP. If the starting position is in N, then the first 
player has a winning strategy, otherwise, the second player does. 


EXAMPLE 1.1.6 (Chomp Revisited). Recall the game of Chomp from|Exam-| 
ple 1.0.1; Since Chomp is progressively bounded, [Theorem 1.1.5}implies that one 


of the players must have a winning strategy. We will show that it is the first player. 


N 
— 


FIGURE 1.3. The graph representation of a 2 x 3 game of Chomp: Every 
move from a P-position leads to an N-position (bold black links); from every 
N-position there is at least one move to a P-position (red links). 


THEOREM 1.1.7. Starting from a position in which the remaining chocolate bar 
is rectangular of size greater than 1 x 1, the next player to move has a winning 
strategy. 


PROOF. Given a rectangular bar of chocolate R of size greater than 1 x 1, let 
R- be the result of chomping off the upper-right 1 x 1 corner of R. 

If R- € P, then R € N, and a winning move is to chomp off the upper-right 
corner. 

If R- € N, then there is a move from R~ to some position x in P. But if we 
can chomp R` to get x, then chomping R in the same way will also give x, since 
the upper-right corner will be removed by any such chomp. Since there is a move 
from R to the position x in P, it follows that R € N. 
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The technique used in this proof is called strategy-stealing. Note that the proof 
does not show that chomping the upper-right corner is a winning move. In the 
2 x 3 case, chomping the upper-right corner happens to be a winning move (since 
this leads to a move in P; see Figure [1.3), but for the 3 x 3 case, chomping the 
upper-right corner is not a winning move. The strategy-stealing argument merely 
shows that a winning strategy for the first player must exist; it does not help us 
identify the strategy. 


1.1.1. Nim. Next we analyze the game of Nim, a particularly important pro- 
gressively bounded impartial game. 


EXAMPLE 1.1.8 (Nim). In Nim, there are several piles, each containing finitely 
many chips. A legal move is to remove a positive number of chips from a single 
pile. Two players alternate turns with the aim of removing the last chip. Thus, the 
terminal position is the one where there are no chips left. 


Because Nim is progressively bounded, all the positions are in N or P, and 
one of the players has a winning strategy. We will describe the winning strategy 
explicitly in the next section. 

As usual, we will analyze the game by working backwards from the terminal 


positions. We denote a position in the game by (n1, n2,..., Nk), meaning that there 
are k piles of chips and that the first has nı chips in it, the second has nz, and so 
on. 


Certainly (0,1) and (1,0) are in N. On the other hand, (1,1) € P because 
either of the two available moves leads to (0,1) or (1,0). We see that (1,2), (2,1) € 
N because the next player can create the position (1,1) € P. More generally, 
(n,n) € P for n € N and (n,m) E€ N if n,m € N are not equal. 


FIGURE 1.4. This figure shows why (n,m) with n < m is in N: The next 
player’s winning strategy is to remove m — n chips from the bigger pile. 


Moving to three piles, we see that (1,2,3) € P, because whichever move the 
first player makes, the second can force two piles of equal size. It follows that 
(1,2,3,4) € N because the next player to move can remove the fourth pile. 

To analyze (1,2,3,4,5), we will need the following lemma: 


LEMMA 1.1.9. For two Nim positions X = (a1,...,U~%) and Y = (y1,.--, ye), 
we denote the position (41,...,Uk,Y1,---, Ye) by (X,Y). 
(1) If X and Y are in P, then (X,Y) €P. 
(2) If X © P and Y EN (or vice versa), then (X,Y) €N. 
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Proor. If (X,Y) has 0 chips, then X, Y, and (X,Y) are all P-positions, so 
the lemma is true in this case. 
Next, we suppose by induction that whenever (X,Y) has n or fewer chips, 


X €P and Y €P implies (X,Y) € P 


and 
X €P andY €N implies (X,Y) € N. 
Suppose (X,Y) has at most n+ 1 chips. If X € P and Y € N, then the next player 
to move can reduce Y to a position in P, creating a (P,P) configuration with at 
most n chips, so by the inductive hypothesis it must be in P. Thus, (X,Y) € N. 
If X € P and Y €P, then the next player must move to an (N, P) or (P, N) 
position with at most n chips, which by the inductive hypothesis is an N position. 
Thus, (X,Y) € P. 


Going back to our example, (1,2,3,4,5) can be divided into two subgames: 
(1,2,3) € P and (4,5) € N. By the lemma, (1,2,3,4,5) € N. 


REMARK 1.1.10. Note that if X,Y € N, then (X,Y) can be either in P or 
in N. E.g., (1,1) € P but (1,2) € N. Thus, the divide-and-sum method (that 
is, using [Lemma 1.1.9) for analyzing a position is limited. For instance, it does 
not classify any configuration of three piles of different sizes, since every nonempty 
subset of piles is in N. 


1.1.2. Bouton’s solution of Nim. We next describe a simple way of deter- 
mining if a state is in P or N: We explicitly describe a set Z of configurations 
(containing the terminal position) such that, from every position in Z, all moves 
lead to Z°, and from every position in Z°, there is a move to Z. It will then follow 
by induction that Z = P. 

Such a set Z can be defined using the notion of Nim-sum. Given integers 
Z1, T2,..., Zk, the Nim-sum z1 © £2 ® -+ © £k is obtained by writing each x; in 
binary and then adding the digits in each column mod 2. For example: 


decimal | binary 


fi 3 0011 
£2 9 1001 
t3 13 1101 


£1 082720 T3 T 0111 


DEFINITION 1.1.11. The Nim-sum zı @ z2 ® --- ® £k of a configuration 
(£1, £2,. -+) Tk) is defined as follows: Write each pile size x; in binary; i.e., x; = 
Ž;>0 tyv where Tij E {0, 1}. Then 


T1 PT9: Gay, = Y (21 D O E)Z 
j20 


where for bits, 
k 
Tij OLq7 ® -O kj = (Zeu) mod 2. 
i=l 


THEOREM 1.1.12 (Bouton’s Theorem). A Nim position x = (£1, £2,..., £k) 
is in P if and only if the Nim-sum of its components is 0. 
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To illustrate the theorem, consider the starting position (1, 2,3): 


decimal | binary 
1 01 
2 10 
3 11 
0 00 


Adding the two columns of the binary expansions modulo two, we obtain 00. The 
theorem affirms that (1, 2,3) € P. Now, we prove Bouton’s Theorem. 


PROOF OF [THEOREM 1.1.12} Define Z to be those positions with Nim-sum 
zero. We will show that: 


(a) From every position in Z, all moves lead to Z°. 
(b) From every position in Z°, there is a move to Z. 


(a) Let x = (@1,..., 2%) E Z\O. Suppose that we reduce pile £, leaving £, < xe 
chips. This must result in some bit in the binary representation of xe, say the j*”, 
changing from 1 to 0. The number of 1’s in the jt? column was even, so after the 
reduction it is odd. 

(b) Next, suppose that x = (x1, £2,..., £k) Z. Let s = z1 ®--- zk £0. 
Let j be the position of the leftmost 1 in the expression for s. There are an odd 
number of values of i € {1,..., k} with a 1 in position j. Choose one such i. Now 
x; ® s has a 0 in position j and agrees with x; in positions j + 1, j +2,... to the 
left of j, so z; B s < xi. Consider the move which reduces the it! pile size from z; 
to x; s. The Nim-sum of the resulting position (£1, ..., Zi—1, Zi ® S, ipl,- --, Tk) 
is 0, so this new position lies in Z. Here is an example with i = 1 and zı @s = 3. 


decimal | binary decimal | binary 

£1 6 0110 108 3 0011 

T2 12 1100 T2 12 1100 

T3 15 1111 T3 15 1111 

s = £1 022023 5 0101 (x1 ® s) ® T2 O43 0 0000 


This verifies (b). 
It follows by induction on n that Z and P coincide on configurations with at 
most n chips. We also obtain the winning strategy: For any Nim-position that is 


not in Z, the next player should move to a position in Z, as described in the proof 
of (b). 


1.1.3. Other impartial games. We next consider two other games that are 
just Nims in disguise. 


EXAMPLE 1.1.13 (Rims). A starting position consists of a finite number of 
dots in the plane and a finite number of continuous loops that do not intersect. 
Each loop must pass through at least one dot. Each player, in his turn, draws a 
new loop that does not intersect any other loop. The goal is to draw the last such 
loop. 


Next, we analyze the game. For a given position of Rims, we say that two 
uncovered dots are equivalent if there is a continuous path between them that does 
not intersect any loops. This partitions the uncovered dots into equivalence classes. 
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FIGURE 1.5. Two moves in a game of Rims. 


Classical plane topology ensures that for any equivalence class of, say, k dots, and 
any integers w,u,v > 0 such that w+u+v=k and w > 1, a loop can be drawn 
though w dots so that u dots are inside the loop (forming one equivalence class) 
and v dots are outside (forming another). 

To see the connection to Nim, think of each class of dots as a pile of chips. A 
loop, because it passes through at least one dot, in effect, removes at least one chip 
from a pile and splits the remaining chips into two new piles. This last part is not 
consistent with the rules of Nim unless the player draws the loop so as to leave the 
remaining chips in a single pile. 


ee E ) 

@e@ © i | 3 
00 (KI) 000o 

2 % % A 


X 


1 X 


FIGURE 1.6. Equivalent sequence of moves in Nim with splittings allowed. 


Thus, Rims is equivalent to a variant of Nim where players have the option of 
splitting a pile into two piles after removing chips from it. As the following theorem 
shows, the fact that players have the option of splitting piles has no impact on the 
analysis of the game. 


THEOREM 1.1.14. The sets N and P coincide for Nim and Rims. 


PROOF. Let x = (x1,..., £k) be a position in Rims, represented by the number 
of dots in each equivalence class. Let Z be the collection of Rims positions with 
Nim-sum 0. 


From any position x ¢ Z, there is a move in Nim, which is also legal in Rims, 
to a position in Z. 

Given a Rims position x € Z\0, we must verify that every Rims move leads to 
Z°. We already know this for Nim moves, so it suffices to consider a move where 
some equivalence class xg is reduced to two new equivalence classes of sizes u and 
v, where u +v < ze. Since uv < u +v < zy, it follows that u@v and xe must 
disagree in some binary digit, so replacing xe by (u,v) must change the Nim-sum. 
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EXERCISE 1.a (Staircase Nim). This game is played on a staircase of n steps. 
On each step j for j =1,...,n is a stack of coins of size x; > 0. 

Each player, in his turn, picks 7 and moves one or more coins from step j to step 
j —1. Coins reaching the ground (step 0) are removed from play. The game ends 
when all coins are on the ground, and the last player to move wins. See [Figure 1.7} 

Show that the P-positions in Staircase Nim are the positions such that the 
stacks of coins on the odd-numbered steps have Nim-sum 0. 


, Eeo 


2 @ 2 
1 (OS) 1 


Ww 


Corresponding move of Nim 


FIGURE 1.7. A move in Staircase Nim, in which two coins are moved from 
step 3 to step 2. Considering the odd stairs only, the above move is equivalent 
to the move in regular Nim from (3,5) to (3,3). 


1.2. Partisan games 


A combinatorial game in which the legal moves in some positions, or the sets 
of winning positions, differ for the two players, are called partisan. 

While an impartial combinatorial game can be represented as a graph with a 
single edge-set, a partisan game is most often given by a set of nodes X representing 
the positions of the game and two sets of directed edges that represent the legal 
moves available to either player. Let Er, Ery be the two edge-sets for players I 
and II, respectively. If (x,y) is a legal move for player i € {I, II}, then (x,y) € Ei, 
and we say that y is a successor of x. We write S(x) = {y : (x,y) € Ej}. 

We start with a simple example: 


EXAMPLE 1.2.1 (A partisan subtraction game). Starting with a pile of 
x € N chips, two players, I and II, alternate taking a certain number of chips. 
Player I can remove 1 or 4 chips. Player II can remove 2 or 3 chips. The last player 
who removes chips wins the game. 


This is a progressively bounded partisan game where both the terminal nodes 


and the moves are different for the two players. See|Figure 1.8 
From this example we see that the number of steps it takes to complete the 


game from a given position now depends on the state of the game, s = (2,71), 
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FIGURE 1.8. The partisan Subtraction game: The red dotted (respectively, 
green solid) edges represent moves of player I (respectively, II). Node 0 is 
terminal for either player, and node 1 is also terminal if the last move was by 
player I. 


where x denotes the position and i € {I,II} denotes the player that moves next. 
We let B(x, i) denote the maximum number of moves to complete the game from 
state (a, i). 


The following theorem is analogous to|Theorem 1.1.5 


THEOREM 1.2.2. In any progressively bounded combinatorial game with no ties 
allowed, one of the players has a winning strategy which depends only upon the 
current state of the game. 


EXERCISE 1.b. Prove|Theorem 1.2.2 


Theorem 1.2.2/relies essentially on the game being progressively bounded. Next 
we show that many games have this property. 


LEMMA 1.2.3. In a game with a finite position set, if the players cannot move 
to repeat a previous game state, then the game is progressively bounded. 


PROOF. If there are n positions x in the game, there are 2n possible game states 
(x,7), where i is one of the players. When the players play from position (x, i), the 
game can last at most 2n steps, since otherwise a state would be repeated. 


The games of Chess and Go both have special rules to ensure that the game 
is progressively bounded. In Chess, whenever the board position (together with 
whose turn it is) is repeated for a third time, the game is declared a draw. (Thus 
the real game state effectively has built into it all previous board positions.) In Go, 
it is not legal to repeat a board position (together with whose turn it is), and this 
has a big effect on how the game is played. 
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1.2.1. The game of Hex. Recall the description of Hex from the preface. 


EXAMPLE 1.2.4 (Hex). Hex is played on a rhombus-shaped board tiled with 
hexagons. Each player is assigned a color, either blue or yellow, and two opposing 
sides of the board. The players take turns coloring in empty hexagons. The goal 
for each player is to link his two sides of the board with a chain of hexagons in his 
color. Thus, the terminal positions of Hex are the full or partial colorings of the 
board that have a chain crossing. 


FIGURE 1.9. A completed game of Hex with a yellow chain crossing. 


Note that Hex is a partisan, progressively bounded game where both the termi- 
nal positions and the legal moves are different for the two players. In 
below, we will prove that any fully colored, standard Hex board contains either a 
blue crossing or a yellow crossing but not both. This topological fact guarantees 
that ties are not possible, so one of the players must have a winning strategy. We 
will now prove, again using a strategy-stealing argument, that the first player can 
always win. 


THEOREM 1.2.5. On a standard, symmetric Hex board of arbitrary size, the 
first player has a winning strategy. 


PROOF. We know that one of the players has a winning strategy. Suppose that 
the second player has a winning strategy S. The first player, on his first move, just 
colors in an arbitrarily chosen hexagon. Subsequently, the first player ignores his 
first move and plays S rotated by 90°. If S requires that the first player move in 
the spot that he chose in his first turn and there are empty hexagons left, he just 
picks another arbitrary spot and moves there instead. 

Having an extra hexagon on the board can never hurt the first player — it can 
only help him. In this way, the first player, too, is guaranteed to win, implying that 
both players have winning strategies, a contradiction. 


1.2.2. Topology and Hex: A path of arrows*. We now present two proofs 
that any colored standard Hex board contains a monochromatic crossing (and all 
such crossings have the same color). The proof in this section is quite general 
and can be applied to nonstandard boards. The proof in the next section has the 
advantage of showing that there can be no more than one crossing, a statement 
that seems obvious but is quite difficult to prove. 
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In the following discussion, precolored hexagons are referred to as boundary. 
Uncolored hexagons are called interior. Without loss of generality, we may assume 


that the edges of the board are made up of precolored hexagons (see|Figure 1.10). 
Thus, the interior hexagons are surrounded by hexagons on all sides. 


THEOREM 1.2.6. For a filled-in standard Hex board with nonempty interior and 
with the boundary divided into two disjoint yellow and two disjoint blue segments, 
there is always at least one crossing between a pair of segments of like color. 


Proor. Along every edge separating a blue hexagon and a yellow one, insert 
an arrow so that the blue hexagon is to the arrow’s left and the yellow one to its 
right. In the initial position, there will be four such arrows, two directed toward 
the interior of the board (call these entry arrows) and two directed away from the 
interior (call these exit arrows). See the left side of 


OCO 


FIGURE 1.10. The left figure shows the entry and exit arrows on an empty 
board. The right figure shows a filled-in board with a blue crossing on the left 
side of the directed path. 


Now, suppose the board has been arbitrarily filled with blue and yellow hexagons. 
Starting with one of the entry arrows, we will show that it is possible to construct 
a continuous path by adding arrows tail-to-head always keeping a blue hexagon on 
the left and a yellow on the right. 

In the interior of the board, when two hexagons share an edge with an arrow, 
there is always a third hexagon which meets them at the vertex toward which the 
arrow is pointing. If that third hexagon is blue, the next arrow will turn to the 
right. If the third hexagon is yellow, the arrow will turn to the left. See (a) and 
(b) of Thus, every arrow (except exit arrows) has a unique successor. 
Similarly, every arrow (except entry arrows) has a unique predecessor. 

Because we started our path at the boundary, where yellow and blue meet, our 
path will never contain a loop. If it did, the first arrow in the loop would have two 
predecessors. See (c) of Figure 1.11] Since there are finitely many available edges 
on the board and our path has no loops, it must eventually exit the board via one 
of the exit arrows. 

All the hexagons on the left of such a path are blue, while those on the right 
are yellow. If the exit arrow touches the same yellow segment of the boundary as 
the entry arrow, there is a blue crossing (see|Figure 1.10). If it touches the same 
blue segment, there is a yellow crossing. 
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(a) (b) (c) 


FIGURE 1.11. In (a) the third hexagon is blue and the next arrow turns to 
the right; in (b) the next arrow turns to the left; in (c) we see that in order to 
close the loop an arrow would have to pass between two hexagons of the same 
color. 


1.2.3. Hex and Y. That there cannot be more than one crossing in the game 
of Hex seems obvious until you actually try to prove it carefully. To do this di- 
rectly, we would need a discrete analog of the Jordan curve theorem, which says 
that a continuous closed curve in the plane divides the plane into two connected 
components. The discrete version of the theorem is considerably easier than the 
continuous one, but it is still quite challenging to prove. 

Thus, rather than attacking this claim directly, we will resort to a trick: We 
will instead prove a similar result for a related, more general game — the game of 
Y, also known as Tripod. 


EXAMPLE 1.2.7. The Game of Y is played on a triangular board tiled with 
hexagons. As in Hex, the two players take turns coloring in hexagons, each using his 
assigned color. A player wins when he establishes a Y, a monochromatic connected 
region in his color that meets all three sides of the triangle. 


Playing Hex is equivalent to playing Y with some of the hexagons precolored, 


as shown in|Figure 1.12 


>p 


Blue has a winning Yhere. Reduction of Hex to Y 


FIGURE 1.12. Hex is a special case of Y. 


We will first show below that a filled-in Y board always contains a single Y. 
Because Hex is equivalent to Y with certain hexagons precolored, the existence and 
uniqueness of the chain crossing is inherited by Hex from Y. 


THEOREM 1.2.8. Any blue/yellow coloring of the triangular board contains ei- 
ther a blue Y or a yellow Y, but not both. 
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ProoF. We can reduce a colored board with sides of length n to one with sides 
of length n — 1 as follows: Think of the board as an arrow pointing right. Except 
for the leftmost column of hexagons, each hexagon is the right tip of a small arrow- 
shaped cluster of three adjacent hexagons pointing the same way as the board. 
We call such a triple a triangle. See [Figure 1.13] Starting from the right, recolor 
each hexagon the majority color of the triangle that it tips, removing the leftmost 
column of hexagons altogether. 


Se Per. 


FIGURE 1.13. A step-by-step reduction of a colored Y board. 


We claim that a board of side-length n contains a monochromatic Y if and only 
if the resulting board of size n — 1 does: Suppose the board of size n contains, say, 
a blue Y. Let (hı, h2,..., hL) be a path in this Y from one side of the board to 
another, where each h; is a blue hexagon. We claim that there is a corresponding 
path T,,...,%p—1 of triangles where T; is the unique triangle containing (hi, hj+1). 
The set of rightmost hexagons in each of these triangles yields the desired blue 
path in the reduced graph. Similarly, there is a blue path from h; to the third side 
of the original board that becomes a blue path in the reduced board, creating the 
desired Y. 


> > > 


FIGURE 1.14. The construction of a blue path in the reduced board. 


For the converse, a blue path between two sides A and B in the smaller board 
induces a path T}, ..., Tọ of overlapping, majority blue triangles between A and B 
on the larger board. Suppose that we have shown that there is a path hy,...,he 
(possibly with repetitions) of blue hexagons that starts at A and ends at he € Tk in 
the larger board. If Tk N Tk+1 is blue, take hey, = Tk O Tk+1. Otherwise, the four 
hexagons in the symmetric difference Tk A Tk+1 are all blue and form a connected 
set. Extending the path by at most two of these hexagons, he, 1, Re+2 will reach 


Tr41- See|Figure 1.15 
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FIGURE 1.15. An illustration of the four possibilities for how a blue path going 
through Tk and Tk+ı reduces to a blue path in the smaller board. 


—— 


—>.> 


Thus, we can inductively reduce the board of size n to a board of size one, a 
single, colored cell. By the argument above, the color of this last cell is the color 
of a winning Y on the original board. 


Because any colored Y board contains one and only one winning Y, it follows 
that any colored Hex board contains one and only one crossing. 


REMARK 1.2.9. Why did we introduce Y instead of carrying out the proof 
directly for Hex? Hex corresponds to a subclass of Y boards, but this subclass is 
not preserved under the reduction we applied in the proof. 


1.2.4. More general boards*. The statement that any colored Hex board 
contains exactly one crossing is stronger than the statement that every sequence of 
moves in a Hex game always leads to a crossing, i.e., a terminal position. To see 
why it’s stronger, consider the following variant of Hex. 


EXAMPLE 1.2.10. Six-sided Hex is similar to ordinary Hex, but the board 
is hexagonal, rather than square. Each player is assigned three nonadjacent sides 
and the goal for each player is to create a crossing in his color between two of his 
assigned sides. Thus, the terminal positions are those that contain one and only 
one monochromatic crossing between two like-colored sides. 


In Six-sided Hex, there can be crossings of both colors in a completed board, 
but the game ends when the first crossing is created. 


THEOREM 1.2.11. Consider an arbitrarily shaped simply-connected| filled-in 
Hex board with nonempty interior with its boundary partitioned into n blue and n 
yellow segments, where n > 2. Then there is at least one crossing between some 
pair of segments of like color. 


EXERCISE 1.c. Adapt the proof of /Theorem 1.2.6] to prove [Theorem 1.2.11 


(As in Hex, each entry and exit arrow lies on the boundary between a yellow and 
blue segment. Unlike in Hex, in shapes with with six or more sides, these four 
segments can be distinct. In this case there is both a blue and a yellow crossing. 


Sec [Figure 1-16) 


3 “Simply-connected” means that the board has no holes. Formally, it requires that every 
continuous closed curve on the board can be continuously shrunk to a point. 
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FIGURE 1.16. A filled-in Six-sided Hex board can have both blue and yellow 
crossings. In a game when players take turns to move, one of the crossings 
will occur first, and that player will be the winner. 


1.2.5. Other partisan games played on graphs. We now discuss several 
other partisan games which are played on graphs. For each of our examples, we 
can explicitly describe a winning strategy for the first player. 


EXAMPLE 1.2.12. The Shannon Switching Game is a variant of Hex played 
by two players, Cut and Short, on a connected graph with two distinguished nodes, 
A and B. Short, in his turn, reinforces an edge of the graph, making it immune to 
being cut. Cut, in her turn, deletes an edge that has not been reinforced. Cut wins 
if she manages to disconnect A from B. Short wins if he manages to link A to B 
with a reinforced path. 


B B B 
a ANN LANS 
ee E E oo o ee o- -0-0-0-0 
xK 
6- o- o o-o r Raa e a] o-o oe 
e e e e e e sd e e e e e ] e e 
SY . 6è sS æ ‘ Sy e e 
y >i rf 
Short Cut Short 


FIGURE 1.17. Shannon Switching Game played on a 5 x 6 grid (the top and 
bottom rows have been merged to the points A and B). Shown are the first 
three moves of the game, with Short moving first. Available edges are indi- 
cated by dotted lines, and reinforced edges by thick lines. Scissors mark the 
edge that Cut deleted. 


We focus here on the case where the graph is an L x (L + 1) grid with the 
vertices of the bottom side merged into a single vertex, A, and the vertices on the 
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top side merged into another node, B. In this case, the roles of the two players are 


symmetric, due to planar duality. See|Figure 1.18 


B 
o 


@ 
° 
© 


A 


FIGURE 1.18. The dual graph Gt of planar graph G is defined as follows: 
Associated with each face of G is a vertex in G'. Two faces of G are adjacent if 
and only if there is an edge between the corresponding vertices of GÎ. Cutting 
in G is shorting in G? and vice versa. 


FIGURE 1.19. The figure shows corresponding positions in the Shannon 
Switching Game and an equivalent game known as Bridg-It. In Bridg-It, 
Black, in his turn, chooses two adjacent black dots and connects them with a 
edge. Green tries to block Black’s progress by connecting an adjacent pair of 
green dots. Black and green edges cannot cross. Black’s goal is to construct 
a path from top to bottom, while Green’s goal is to block him by building a 
left-to-right path. The black dots are on the square lattice, and the green dots 
are on the dual square lattice. 


EXERCISE l.d. Use a strategy-stealing argument to show that the first player 
in the Shannon switching game has a winning strategy. 


Next we will describe a winning strategy for the first player, which we will 
assume is Short. We will need some definitions from graph theory. 


DEFINITION 1.2.13. A tree is a connected undirected graph without cycles. 
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(1) Every tree must have a leaf, a vertex of degree 1. 

(2) A tree on n vertices has n — 1 edges. 

(3) A connected graph with n vertices and n — 1 edges is a tree. 
(4) A graph with no cycles, n vertices, and n — 1 edges is a tree. 


The proofs of these properties of trees are left as an exercise (Exercise 1.10). 


THEOREM 1.2.14. In Shannon’s Switching Game on an L x (L+1) board, Short 
has a winning strategy if he moves first. 


PROOF. Short begins by reinforcing an edge of the graph G, connecting A to 
an adjacent dot, a. We identify A and a by “fusing” them into a single new A. On 
the resulting graph there is a pair of edge-disjoint trees such that each tree spans 
(contains all the nodes of) G. (Indeed, there are many such pairs.) 


O HF 


FIGURE 1.20. Two spanning trees — the blue one is constructed by first 
joining top and bottom using the leftmost vertical edges and then adding 
other vertical edges, omitting exactly one edge in each row along an imaginary 
diagonal; the red tree contains the remaining edges. The two circled nodes 
are identified. 


For example, the blue and red subgraphs in the 4 x 5 grid in [Figure 1.20] are 
such a pair of spanning trees: Each of them is connected and has the right number 
of edges. The same construction can be repeated on an arbitrary L x (L +1) grid. 

Using these two spanning trees, which necessarily connect A to B, we can define 
a strategy for Short. 

The first move by Cut disconnects one of the spanning trees into two compo- 
nents. Short can repair the tree as follows: Because the other tree is also a spanning 
tree, it must have an edge, e, that connects the two components. Short reinforces 
e. See [Figure 1-21] 

If we think of a reinforced edge e as being both red and blue, then the resulting 
red and blue subgraphs will still be spanning trees for G. To see this, note that 
both subgraphs will be connected and they will still have n edges and n— 1 vertices. 
Thus, by property (3), they will be trees that span every vertex of G. 

Continuing in this way, Short can repair the spanning trees with a reinforced 
edge each time Cut disconnects them. Thus, Cut will never succeed in disconnecting 
A from B, and Short will win. 
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A 


FIGURE 1.21. The left side of the figure shows how Cut separates the blue 
tree into two components. The right side shows how Short reinforces a red 
edge to reconnect the two components. 


EXAMPLE 1.2.15. Recursive Majority is a game played on a complete ternary 
tree of height h (see[Figure 1.22). The players take turns marking the leaves, player I 
with a “+” and player II with a “—”. A parent node acquires the majority sign of 
its children. Because each interior (non-leaf) vertex has three children, its sign is 
determined unambiguously. The player whose mark is assigned to the root wins. 


This game always ends in a win for one of the players, so one of them has a 
winning strategy. 


FIGURE 1.22. A ternary tree of height 2; the leftmost leaf is denoted by 11. 
Here player I wins the Recursive Majority game. 


To analyze the game, label each of the three edges emanating downward from 
a single node 1, 2 or 3 from left to right. (See [Figure 1.22}) Using these labels, we 
can identify each node below the root with the label sequence on the path from the 
root that leads to it. For instance, the leftmost leaf is denoted by 11...1, a word 
of length h consisting entirely of ones. A strategy-stealing argument implies that 
the first player to move has the advantage. 

We can describe his winning strategy explicitly: On his first move, player I 
marks the leaf 11...1 with a “+”. To determine his moves for the remaining even 
number of leaves, he first pairs up the leaves as follows: Letting 1” be shorthand for 
a string of ones of fixed length k > 0 and letting w stand for an arbitrary fixed word 
of length h — k — 1, player I pairs the leaves by the following map: 1*2w +> 1*3w. 


(See [Figure 1.23) 


Once the pairs have been identified, whenever player II marks a leaf with a 


“_” player I marks its mate with a “+”. 
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FIGURE 1.23. Player I marks the leftmost leaf in the first step. Some matched 
leaves are marked with the same shade of green or blue. 


THEOREM 1.2.16. The player I strategy described above is a winning strategy. 


ProoF. The proof is by induction on the height h of the tree. The base case 
of h = 1 is immediate. By the induction hypothesis, we know that player I wins in 
the left subtree of depth h — 1. 

As for the remaining two subtrees of depth h — 1, whenever player II wins in 
one, player I wins in the other because each leaf in the middle subtree is paired 
with the corresponding leaf in the right subtree. Hence, player I is guaranteed to 
win two of the three subtrees, thus determining the sign of the root. 


Notes 


The birth of combinatorial game theory as a mathematical subject can be traced to 
Bouton’s 1902 characterization of the winning positions in Nim [Bou02]. In this chap- 
ter, we saw that many impartial combinatorial games can be reduced to Nim. This has 
been formalized in the Sprague-Grundy theory for analyzing all 
progressively bounded impartial combinatorial games. 

The game of Chomp was invented by David Gale [BCGS82a]. It is an open research 
problem to describe a general winning strategy for Chomp. 

The game of Hex was invented by Piet Hein and reinvented by John Nash. Nash 
proved that Hex cannot end in a tie and that the first player has a winning strategy 
[Gal79]. Shimon Even and Robert Tarjan showed that determining whether a 
position in the game of Hex is a winning position is PSPACE-complete. This result was 
further generalized by Stefan Reisch [Rei81]. These results mean that an efficient algorithm 
for solving Hex on boards of arbitrary size is unlikely to exist. For small boards, however, 
an Internet-based community of Hex enthusiasts has made substantial progress (much 
of it unpublished). Jing Yang [Yan], a member of this community, has announced the 
solution of Hex (and provided associated computer programs) for boards of size up to 
9x9. Usually, Hex is played on an 11 x 11 board, for which a winning strategy for player I 
is not yet known. 

The game of Y and the Shannon Switching Game were introduced by Claude Shan- 
non [Gar88]. The Shannon Switching Game can be played on any graph. The special case 
where the graph is a rectangular grid was invented independently by David Gale under 
the name Bridg-It (see [Figure 1.19}. Oliver Gross proved that the player who moves first 
in Bridg-It has a winning strategy. Several years later, Alfred B. Lehman (see 
also [Man96]) devised a solution to the general Shannon Switching Game. For more on 


the Shannon Switching Game, see [BCG82b]|. 
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For a more complete account of combinatorial games, see the books BCG82a 
BCG82b] [BCG04 and [YZ15| Chapter 15]. 


Exercises 


1.1. Consider a game of Nim with four piles, of sizes 9, 10, 11, 12. 
(a) Is this position a win for the next player or the previous player (as- 
suming optimal play)? Describe the winning first move. 
(b) Consider the same initial position, but suppose that each player is 
allowed to remove at most 9 chips in a single move (other rules of 
Nim remain in force). Is this an N- or P-position? 


1.2. Consider a game where there are two piles of chips. On a player’s turn, 
he may remove between 1 and 4 chips from the first pile or else remove 
between 1 and 5 chips from the second pile. The person who takes the 
last chip wins. Determine for which m,n € N it is the case that (m,n) € P. 


1.3. In the game of Nimble, there are n slots arranged from left to right, and 
a finite number of coins in each slot. In each turn, a player moves one of 
the coins to the left, by any number of places. The first player who can’t 
move (since all coins are in the leftmost slot) loses. Determine which of 
the starting positions are P-positions. 


1.4. Given combinatorial games G1,..., Gk, let Gi + G2 +--- + G be the 
following game: A state in the game is a tuple (#1, £2,..., £k), where x; 
is a state in G;. In each move, a player chooses one G; and takes a step 
in that game. A player who is unable to move, because all games are in a 
terminal state, loses. 

Let Gı be the Subtraction game with subtraction set Sı = {1,3, 4}, 
Gə be the Subtraction game with S = {2,4,6}, and G3 be the Subtraction 
game with S3 = {1,2,...,20}. Who has a winning strategy from the 
starting position (100, 100,100) in Gi + G2 + G3? 


1.5. Consider two arbitrary progressively bounded combinatorial games G and 
Gə with positions x, and x2. If for any third such game G3 and position zs, 
the outcome of (x1, £3) in G1 + G3 (i.e., whether it’s an N- or P-position) 
is the same as the outcome of (2,23) in G2 + G3, then we say that the 
game-position pairs (G1, x1) and (G2, 22) are equivalent. 

Prove that equivalence for game-position pairs is transitive, reflexive, 
and symmetric. (Thus, it is indeed an equivalence relation.) 


1.6. Let G;, i = 1,2, be progressively bounded impartial combinatorial games. 
Prove that the position (x1, £2) in Gi + G2 is a P-position if and only if 
(G1, 21) and (G2, £2) are equivalent. 
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LT Consider the game of Up-and-Down Rooks played on a standard chess- 
board. Player I has a set of white rooks initially located in the bottom 
row, while player IT has a set of black rooks in the top row. In each turn, a 
player moves one of its rooks up or down a column, without skipping over 
the other rook or occupying its position. The first player who cannot move 
loses. This game is not progressively bounded, yet an optimal strategy 
exists. Find such a strategy by relating this game to a Nim position with 
8 piles. 


a bic defg h 


1.8. Two players take turns placing dominos on an n x 1 board of squares, 
where each domino covers two squares and dominos cannot overlap. The 
last player to play wins. 

(a) Where would you place the first domino when n = 11? 
(b) Show that for n even and positive, the first player can guarantee a 
win. 


1.9. Recall the game of Y shown in|Figure 1.12} Prove that the first player has 
a winning strategy. 


1.10. Prove the following statements. Hint: Use induction. 
(a) Every tree on n > 1 vertices must have a leaf — a vertex of degree 1. 
(Indeed, it must have at least two leaves.) 
(b) A tree on n vertices has n — 1 edges. 
(c) A connected graph with n vertices and n — 1 edges is a tree. 
(d) A graph with no cycles, n vertices, and n — 1 edges is a tree. 
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CHAPTER 2 


Two-person zero-sum games 


We begin with the theory of two-person zero-sum games, developed in a 
seminal paper by John von Neumann [vN28]. In these games, one player’s loss is 
the other player’s gain. The central theorem for two-person zero-sum games is that 
even if each player’s strategy is known to the other, there is an amount that one 
player can guarantee as her expected gain, and the other, as his maximum expected 
loss. This amount is known as the value of the game. 


2.1. Examples 


FIGURE 2.1. Two people playing Pick-a-Hand. 


Consider the following game: 


EXAMPLE 2.1.1 (Pick a Hand, a betting game). There are two players, 
Chooser (player I) and Hider (player II). Hider has two gold coins in his back 
pocket. At the beginning of a turn, hef] puts his hands behind his back and either 


1 In almost all two-person games, we adopt the convention that player I is female and player 
II is male. 


34 
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takes out one coin and holds it in his left hand (strategy L1) or takes out both and 
holds them in his right hand (strategy R2). Chooser picks a hand and wins any 
coins the hider has hidden there. She may get nothing (if the hand is empty), or 
she might win one coin, or two. How much should Chooser be willing to pay in 
order to play this game? 

The following matrix summarizes the payoffs to Chooser in each of the cases: 


Hider 
3 LI R2 
2[L|1I 0 
Slr] o 2 
Q 


How should Hider and Chooser play? Imagine that they are conservative and 
want to optimize for the worst-case scenario. Hider can guarantee himself a loss of 
at most 1 by selecting action L1, whereas if he selects R2, he has the potential to 
lose 2. Chooser cannot guarantee herself any positive gain since, if she selects L, in 
the worst-case, Hider selects R2, whereas if she selects R, in the worst case, Hider 
selects L1. 

Now consider expanding the possibilities available to the players by incorpo- 
rating randomness. Suppose that Hider selects L1 with probability yı and R2 with 
probability y2 = 1— yı. Hider’s expected loss is yı if Chooser plays L, and 2(1— y1) 
if Chooser plays R. Thus Hider’s worst-case expected loss is max(y1,2(1 — y1)). 
To minimize this, Hider will choose yı = 2/3. Thus, no matter how Chooser 
plays, Hider can guarantee himself an expected loss of at most 2/3. See 
Figure 2.2 


2(1 — x1) : when Hider 
plays R2 


2(1 — y1) : when Chooser 
plays R 


“1: when Hider 
plays L1 


Yı : when Chooser 
plays L 


Expected Expected 
gain loss 
of Chooser of Hider 
Worst-case gain 
o 2/3 H 0 2/3 i 
Chooser’s choice of xı Hider’s choice of yı 


FIGURE 2.2. The left side of the figure shows the worst-case expected gain of 
Chooser as a function of xı, the probability with which she plays L. The right 
side of the figure shows the worst-case expected loss of Hider as a function of 
yı, the probability with which he plays L1. (In this example, the two graphs 
“look” the same because the payoff matrix is symmetric. See for 
a game where the two graphs are different.) 
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Similarly, suppose that Chooser selects L with probability x; and R with prob- 
ability £2 = 1 — zı. Then Chooser’s worst-case expected gain is min(x,, 2(1—)). 
To maximize this, she will choose zı = 2/3. Thus, no matter how Hider plays, 
Chooser can guarantee herself an expected gain of at least 2/3. 

Observe that without some extra incentive, it is not in Hider’s interest to play 
Pick a Hand, because he can only lose by playing. To be enticed into joining the 
game, Hider will need to be paid at least 2/3. Conversely, Chooser should be willing 
to pay any sum below 2/3 to play the game. Thus, we say that the value of this 


game is 2/3. 
5 EXERCISE 2.a. Consider the betting game with the following payoff matrix: 
player II 
z L R 
ITO 2 
FİBİ|5 1 
a 


Draw graphs for this game analogous to those shown in|Figure 2.2) and determine 
the value of the game. 


2.2. Definitions 


A two-person zero-sum game can be represented by an m x n payoff matrix 
A = (aij), whose rows are indexed by the m possible actions of player I and whose 
columns are indexed by the n possible actions of player II. Player I selects an 
action į and player IT selects an action j, each unaware of the other’s selection. 
Their selections are then revealed and player II pays player I the amount aij. 

If player I selects action 7, in the worst case her gain will be min, a;;, and thus 
the largest gain she can guarantee is max; min; a,;. Similarly, if II selects action 
j, in the worst case his loss will be max; a,;, and thus the smallest loss he can 
guarantee is min; max; aij. It follows that 


max min aij < min max ij (2.1) 
7 J J a 
since player I can guarantee gaining the left-hand side and player II can guarantee 
losing no more than the right-hand side. (For a formal proof, see [Lemma 2.6.3}) 
As in[Example 2.1.1] without randomness, the inequality is usually strict. 

A strategy in which each action is selected with some probability is a mixed 
strategy. A mixed strategy for player I is determined by a vector (21,...,%m)" 
where x; represents the probability of playing action i. The set of mixed strategies 
for player I is denoted by 


Ans {ERT 20.3 nma. 


i=1 


Similarly, the set of mixed strategies for player IT is denoted by 


A, = yeER":y, 20,5 yj,=1 
j=l 
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A mixed strategy in which a particular action is played with probability 1 is called 
a pure strategy. Observe that in this vector notation, pure strategies are repre- 
sented by the standard basis vectors, though we often identify the pure strategy e; 
with the corresponding action 7. 

If player I employs strategy x and player II employs strategy y, the expected 
gain of player I (which is the same as the expected loss of player IT) is 

x! Ay = 5 5 TilijYj. 
i j 

Thus, if player I employs strategy x, she can guarantee herself an expected gain of 


min x’ Ay = min(x" A); (2.2) 
yEAn j 


. n . T — . 
since for any z € R”, we have minyea,, Z“ y = min; Z;. 


A conservative player will choose x to maximize (2.2), that is, to maximize her 
worst-case expected gain. This is a safety strategy. 


DEFINITION 2.2.1. A mixed strategy x* € Am is a safety strategy for 
player I if the maximum over x € Am of the function 


xe min x7 Ay 
year 


is attained at x*. The value of this function at x* is the safety value for player 
I. Similarly, a mixed strategy y* € A, is a safety strategy for player II if the 
minimum over y € A,, of the function 


y max x! Ay 


xe Am 


is attained at y*. The value of this function at y* is the safety value for player 
II. 


REMARK 2.2.2. For the existence of safety strategies see |Lemma 2.6.3 


2.3. The Minimax Theorem and its meaning 


Safety strategies might appear conservative, but the following celebrated theo- 
rem shows that the two players’ safety values coincide. 


THEOREM 2.3.1 (Von Neumann’s Minimax Theorem). For any two-person 
zero-sum game with m x n payoff matrix A, there is a number V, called the value 
of the game, satisfying 


max min x’ Ay =V = min max x’ Ay. (2.3) 
xe A, YEA, yeA,, XEAm 


We will prove the Minimax Theorem in 


REMARKS 2.3.2. 
(1) It is easy to check that the left-hand side of equation (2.3) is upper 
bounded by the right-hand side, i.e. 


max min x’ Ay < min max x’ Ay. (2.4) 
XEAm yEAn yeA, XEAm 


(See the argument for |(2.1)| and [Lemma 2.6.3|) The magic of zero-sum 
games is that, in mixed strategies, this inequality becomes an equality. 
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(2) If x* is a safety strategy for player I and y* is a safety strategy for player 

II, then it follows from [Theorem 2.3.1] that 
: *\T T * 

sain (x Ay =V= mix Ay*. (2.5) 
In words, this means that the mixed strategy x* yields player I an expected 
gain of at least V, no matter how II plays, and the mixed strategy y* yields 
player II an expected loss of at most V, no matter how I plays. Therefore, 
from now on, we will refer to the safety strategies in zero-sum games as 
optimal strategies. 

(3) The Minimax Theorem has the following interpretation: If, for every strat- 
egy y € Am of player II, player I has a counterstrategy x = x(y) that 
yields her expected payoff at least V, then player I has one strategy x* 
that yields her expected payoff at least V against all strategies of player 
Il. 


2.4. Simplifying and solving zero-sum games 


In this section, we will discuss techniques that help us understand zero-sum 
games and solve them (that is, find their value and determine optimal strategies 
for the two players). 


2.4.1. Pure optimal strategies: Saddle points. Given a zero-sum game, 
the first thing to check is whether or not there is a pair of optimal strategies that 
is pure. 

For example, in the following game, by playing action 1, player I guarantees 
herself a payoff at least 2 (since that is the smallest entry in the row). Similarly, by 
playing action 1, player II guarantees himself a loss of at most 2. Thus, the value 
of the game is 2. 


player II 
action 1 action 2 
= | action 1 2 3 
Z action 2 1 0 
A 


DEFINITION 2.4.1. A saddle point] of a payoff matrix A is a pair (i*, j*) such 
that 


MAX ajj» = Qir j* = min Qi*j (2.6) 
i j 


If (i*, j*) is a saddle point, then a;»+j» is the value of the game. A saddle point 
is also called a pure Nash equilibrium: Given the action pair (i*,j*), neither 
player has an incentive to deviate. See for a more detailed discussion of Nash 
equilibria. 


2 The term saddle point comes from the continuous setting where a function f(x,y) of two 
variables has a point (x*,y*) at which locally maxz f(x, y*) = f(a*,y*) = miny f(£*,y) . Thus, 
the surface resembles a saddle that curves up in the y-direction and curves down in the x-direction. 
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2.4.2. Equalizing payoffs. Most zero-sum games do not have pure optimal 
strategies. At the other extreme, some games have a pair (x*, y*) of optimal strate- 
gies that are fully mixed, that is, where each action is assigned positive probability. 
In this case, it must be that against y*, player I obtains the same payoff from 
each action. If not, say (Ay*)1 > (Ay*)2, then player I could increase her gain by 
moving probability from action 2 to action 1: This contradicts the optimality of x*. 
Applying this observation to both players enables us to solve for optimal strategies 
by equalizing payoffs. Consider, for example, the following payoff matrix, where 
each row and column is labeled with the probability that the corresponding action 
is played in the optimal strategy: 


player II 
i yı 1l-m 
5 zıl 3 0 
= 1 = w 1 4 
a 


Equalizing the gains for player I’s actions, we obtain 
3yı = yı +41 — y1), 

i.e., y1 = 2/3. Thus, if player II plays (2/3, 1/3), his loss will not depend on player 
Is actions; it will be 2 no matter what I does. 

Similarly, equalizing the losses for player II’s actions, we obtain 

321 + (1 = zı) = 4(1 = zı), 

i.e., xı = 1/2. So if player I plays (1/2,1/2), her gain will not depend on player 
T’s action; again, it will be 2 no matter what II does. We conclude that the value 
of the game is 2. 

See Proposition for a general version of the equalization principle. 


EXERCISE 2.b. Show that any 2 x 2 game (i.e., a game in which each player 
has exactly two strategies) has a pair of optimal strategies that are both pure or 
both fully mixed. Show that this can fail for 3 x 3 games. 


2.4.3. The technique of domination. Domination is a technique for re- 
ducing the size of a game’s payoff matrix, enabling it to be more easily analyzed. 
Consider the following example. 


EXAMPLE 2.4.2 (Plus One). Each player chooses a number from {1,2,...,n} 
and writes it down; then the players compare the two numbers. If the numbers 
differ by one, the player with the higher number wins $1 from the other player. If 
the players’ choices differ by two or more, the player with the higher number pays 
$2 to the other player. In the event of a tie, no money changes hands. 
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The payoff matrix for the game is 


player IT 
il 2 3 4 5 6 n 
1 0 -1l 2 2 2 2 2 
2 1 0 -1 2 2 2 2 
= 3 —2 1 0 -1 2 2 2 
3 4 —2 -2 1 0 -1l 2 o’ 2 
a 5 —2 -2 -2 1 0 -1 2 2 
6 2 2 2 2 1 0 2 2 
n—-1)/-2 -2 .«-- 0 -1 
n —2 -2 ... -2 1 0 


In this payoff matrix, every entry in row 4 is at most the corresponding entry 
in row 1. Thus player I has no incentive to play 4 since it is dominated by row 1. 
In fact, rows 4 through n are all dominated by row 1, and hence player I can ignore 
these rows. 

By symmetry, we see that player II need never play any of actions 4 through 
n. Thus, in Plus One we can search for optimal strategies in the reduced payoff 


matrix: 
player IT 
1 2 3 
P 1 0 —1 2 
= 2 1 0 -1 
e113) -2 1 0 


To analyze the reduced game, let x? = (a1, £2, £3) be player I’s mixed strategy. 
For x to be optimal, each component of 


XTA = (z2 — 203, —T1 +43, 201 — 22) (2.7) 


must be at least the value of the game. In this game, there is complete symmetry 
between the players. This implies that the payoff matrix is antisymmetric: the 
game matrix is square, and ai; = —a,;; for every i and j. 


CLAIM 2.4.3. If the payoff matrix of a zero-sum game is antisymmetric, then 
the game has value 0. 


PrRooF. This is intuitively clear by symmetry. Formally, suppose that the 
value of the game is V. Then there is a vector x € A, such that for all y € An, 
x! Ay > V. In particular 


x? Ax > V. (2.8) 


Taking the transpose of both sides yields x7 ATx = —x? Ax > V. Adding this 
latter inequality to|(2.8)] yields V < 0. Similarly, there is a y € A,, such that for all 
x € A, we have x* Ay < V. Taking x = y yields in the same way that 0 < V. 
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We conclude that for any optimal strategy x in Plus One 
z2 — 2zr3 > 0, 
—zı +z3 > 0, 
2%, — z2 > 0. 
If one of these inequalities were strict, then adding the first, twice the second, and 
the third, we could deduce 0 > 0, so in fact each of them must be an equality. 


Solving the resulting system, with the constraint x; + £2 + £3 = 1, we find that the 
optimal strategy for each player is (1/4, 1/2, 1/4). 


Summary of domination. We say a row £ of a two-person zero-sum game dom- 
inates row 7 if ag; > a,j; for all j. When row 7 is dominated, then there is no loss to 
player I if she never plays it. More generally, we say that subset J of rows dominates 
row i if some convex combination of the rows in J dominates row 7; i.e., there is a 
probability vector (8¢)zer such that for every j 


5 Beae; > Qij. (2.9) 
LEI 
Similar definitions hold for columns. 


S EXERCISE 2.c. Prove that if equation (2.9) holds, then player I can safely ignore 
row 2. 


| 


FIGURE 2.3. The bomber chooses one of the nine squares to bomb. She cannot 
see which squares represent the location of the submarine. 


2.4.4. Using symmetry. 


EXAMPLE 2.4.4 (Submarine Salvo). A submarine is located on two adjacent 
squares of a 3 x 3 grid. A bomber (player I), who cannot see the submerged craft, 
hovers overhead and drops a bomb on one of the nine squares. She wins $1 if she 
hits the submarine and $0 if she misses it. (See|Figure 2.3[) There are nine pure 
strategies for the bomber and twelve for the submarine, so the payoff matrix for 
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the game is quite large. To determine some, but not all, optimal strategies, we can 
use symmetry arguments to simplify the analysis. 

There are three types of moves that the bomber can make: She can drop a 
bomb in the center, in the middle of one of the sides, or in a corner. Similarly, there 
are three types of positions that the submarine can assume: taking up the center 
square, taking up a corner square and the adjacent square clockwise, or taking up a 
corner square and the adjacent square counter-clockwise. It is intuitive (and true) 
that both players have optimal strategies that assign equal probability to actions 
of the same type (e.g., corner-clockwise). To see this, observe that in Submarine 
Salvo a 90° rotation describes a permutation m of the possible submarine positions 
and a permutation ø of the possible bomber actions. Clearly 1+ (rotating by 90° 
four times) is the identity and so is gf. For any bomber strategy x, let mx be the 
rotated row strategy. (Formally (tx); = v,(;)). Clearly, the probability that the 
bomber will hit the submarine if they play 7x and cy is the same as it is when 
they play x and y, and therefore 


minx! Ay = min(rx)’ A(ay) = min(ax)? Ay. 
y y y 


Thus, if V is the value of the game and x is optimal, then 7*x is also optimal for 
all k. 

Fix any submarine strategy y. Then 7*x gains at least V against y; hence so 
does 


1 
x= ge tax +x + nx). 


Therefore x* is a rotation-invariant optimal strategy. 
Using these equivalences, we may write down a more manageable payoff matrix: 


submarine 
center corner-clockwise corner-counterclockwise 
E corner 0 1/4 1/4 
g| midside | 1/4 1/4 1/4 
2| middle| 1 0 0 


Note that the values for the new payoff matrix are different from those in the 
standard payoff matrix. They incorporate the fact that when, say, the bomber is 
playing corner and the submarine is playing corner-clockwise, there is only a one- 
in-four chance that there will be a hit. In fact, the pure strategy of corner for the 
bomber in this reduced game corresponds to the mixed strategy of bombing each 
corner with probability 1/4 in the original game. Similar reasoning applies to each 
of the pure strategies in the reduced game. 

Since the rightmost two columns yield the same payoff to the submarine, it’s 
natural for the submarine to give them the same weight. This yields the mixed 
strategy of choosing uniformly one of the eight positions containing a corner. We 
can use domination to simplify the matrix even further. This is because for the 
bomber, the strategy midside dominates that of corner (because the submarine, 
when touching a corner, must also be touching a midside). This observation reduces 
the matrix to 
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submarine 
center corner 
midside | 1/4 1/4 
middle 1 0 


bomber 


Now note that for the submarine, corner dominates center, and thus we obtain 
the reduced matrix: 


submarine 
; corner 
g midside | 1/4 
E| middle) 0 
Pa 


The bomber picks the better alternative — technically, another application of 
domination — and picks midside over middle. The value of the game is 1/4; an 
optimal strategy for the bomber is to hit one of the four midsides with probability 
1/4 each, and an optimal submarine strategy is to hide with probability 1/8 each 
in one of the eight possible pairs of adjacent squares that exclude the center. The 


symmetry argument is generalized in|Exercise 2.21 


REMARK 2.4.5. It is perhaps surprising that in Submarine Salvo there also 
exist optimal strategies that do not assign equal probability to all actions of the 


same type. (See|Exercise 2.15}) 


2.5. Nash equilibria, equalizing payoffs, and optimal strategies 


A notion of great importance in game theory is Nash equilibrium. In|§2.4.1 
we introduced pure Nash equilibria. In this section, we introduce mixed Nash 
equilibria. 

DEFINITION 2.5.1. A pair of strategies (x*,y*) is a Nash equilibrium in a 
zero-sum game with payoff matrix A if 


. *\T *\T * T * 
Ay = Ay* = Ay*. 2.10 
= OT AVS Os Pay = ee Ay (2.10) 


Thus, x* is a best response to y* and vice versa. 


REMARK 2.5.2. Ifx* = e; and y* = ej», then by (2.2), this definition coincides 
with|Definition 2.4.1 


PROPOSITION 2.5.3. Let x E Am andy E€ An be a pair of mixed strategies. 
The following are equivalent: 


(i) The vectors x and y are in Nash equilibrium. 
(ii) There are Vi, V2 such that 


y =V; for every j such that y; > 0, (2.11) 
Lilij . 
ASV for every j such that yj = 0. 


and 


(2.12) 


5 = J=V2 for every i such that x; > 0, 
943 <V2 for every i such that x; = 0. 


(iii) The vectors x and y are optimal. 
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REMARK 2.5.4. If|(2.11)]and|(2.12)| hold, then 


V= Sou; tiai =X ri ajy = V2. 
j i i j 

PROOF. (i) implies (ii): Clearly, y is a best response to x if and only if y 
assigns positive probability only to actions that yield II the minimum loss given x; 
this is precisely |(2.11)| The argument for [(2.12)] is identical. Thus (i) and (ii) are 
equivalent. 

(ii) implies (iii): Player I guarantees herself a gain of at least Vı and player II 
guarantees himself a loss of at most V2. By the remark, Vj = V2, and therefore 
these are optimal. 

(iii) implies (i): Let V = xT Ay be the value of the game. Since playing x 
guarantees I a gain of a least V, player II has no incentive to deviate from y. 
Similarly for player I. 


2.5.1. A first glimpse of incomplete information. 


EXAMPLE 2.5.5 (A random game). Consider the zero-sum two-player game 
in which the game to be played is randomized by a fair coin toss. If the toss comes 
up heads, the payoff matrix is given by A”, and if tails, it is given by A’. 


player II player IT 
H_n L R T_ n L R 
A 51U]8 2 A a |U|]2 6 
e|D|6 0 =|D|4 10 

T A 


If the players don’t know the outcome of the coin flip before playing, they are 
merely playing the game given by the average matrix 
l p l p 5 4 
a ai ( 5 5 ) 7 
which has a value of 5. (For this particular matrix, the value does not change if I 
is required to reveal her move first.) 
Now suppose that I (but not IT) is told the result of the coin toss and is required 
to reveal her move first. If I adopts the simple strategy of picking the best row in 
whichever game is being played and II realizes this and counters, then I has an 


expected payoff of only 3, less than the expected payoff if she ignores the extra 
information! See and [86.3.3] for a detailed analysis of this and related 


games. 

This example demonstrates that sometimes the best strategy is to ignore extra 
information and play as if it were unknown. A related example arose during World 
War II. Polish and British cryptanalysts had broken the secret code the Germans 
were using (the Enigma machine) and could therefore decode the Germans’ com- 
munications. This created a challenging dilemma for the Allies: Acting on the 
decoded information could reveal to the Germans that their code had been broken, 
which could lead them to switch to more secure encryption. 


EXERCISE 2.d. What is the value of the game if both players know the outcome 
of the coin flip? 
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2.6. Proof of von Neumann’s Minimax Theorem* 


We now prove the von Neumann Minimax Theorem. A different, constructive, 
proof is given in |§18.4.3| The proof will rely on a basic theorem from convex 
geometry. 


Recall first that the (Euclidean) norm of a vector v is the (Euclidean) 
distance between 0 and v and is denoted by ||v||.. Thus ||v|| = Vv? v. A subset 
of a metric space is closed if it contains all its limit points, and bounded if it is 
contained inside a ball of some finite radius R. 


DEFINITION 2.6.1. A set K C R? is convex if, for any two points a,b € K, 
the line segment connecting them also lies in K. In other words, for every a,b € K 
and p € [0,1] 
pat+(1—p)be K. 


THEOREM 2.6.2 (The Separating Hyperplane Theorem). Suppose that 
K CR? is closed and convex. If 0 ¢ K, then there exist z € R? and c € R such 
that 
0<c<z'v 


for allve K. 


Here O denotes the vector of all 0’s. The theorem says that there is a hyper- 
plane (a line in two dimensions, a plane in three dimensions, or, more generally, 
an affine R¢~!-subspace in R4) that separates 0 from K. In particular, on any 
continuous path from 0 to K, there is some point that lies on this hyperplane. The 
separating hyperplane is given by {x e R¢: zx = c}. The point O lies in the 
half-space {x € R? : zTx < c}, while the convex body K lies in the complementary 
half-space {x € R@: 27x > c}. 


FIGURE 2.4. Hyperplane separating the closed convex body K from 0. 


In what follows, the metric is the Euclidean metric. 


PROOF oF [THEOREM 2.6.2] Choose r so that the ball B, = {x € R° : ||x]| < 


r} intersects K. Then the function w + ||w||, considered as a map from K N B, 
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to [0, co), is continuous, with a domain that is nonempty, closed, and bounded (see 
Figure 2.5). Thus the map attains its infimum at some point z in K. For this 
z € K, we have 


z| = inf [|w]. 
wek 


FIGURE 2.5. Intersecting K with a ball to get a nonempty closed bounded domain. 


Let v € K. Because K is convex, for any £ € (0,1), we have that ev+(1—e€)z = 
z—e(z—v)€ K. Since z has the minimum norm of any point in K, 


llz\|? < |lz — e(z = v) ||? = all? — 2ez" (z — v) + €*|\z — v|’. 
Rearranging terms, we get 
2ezT (z — v) <e2l|z—vl|?, that is, 27 (z—v) < sllz —v|l?. 
Letting € approach 0, we find 
z'(z—v) <0, which means that |lz||? < z™v. 


Since z € K and 0 ¢ K, the norm ||z|| > 0. Choosing c = $||z||?, we get 0<c< 
z' vy foral ve K. 


We will also need the following simple lemma: 


LEMMA 2.6.3. Let X andY be closed and bounded sets in RÌ. Let f: Xx Y > 
R be continuous. Then 


a 2.1 
wain f(x,y) < ii max f(x,y) (2.13) 


PRooF. We first prove the lemma for the case where X and Y are finite sets 
(with no assumptions on f). Let (x,y) € X x Y. Clearly 


i X < x,y) < y). 
min f(x,y) S f(%,¥) S max f(x,y) 
Because the inequality holds for any x € X, 


ee si: 
ma f(x,y) < max f(x,y) 


Minimizing over y € Y, we obtain |(2.13) 
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To prove the lemma in the general case, we just need to verify the existence 
of the relevant maxima and minima. Since continuous functions achieve their min- 
imum on compact sets, g(x) = minyey f(x,y) is well-defined. The continuity of 
f and compactness of X x Y imply that f is uniformly continuous on X x Y. In 
particular, 


Ve dð : [x1 — x2| < 6 = |f (x1, y) — f(x2,y)| < € 
and hence |g(x1) — g(x2)| < €. Thus, g : X —> R is a continuous function and 
maxxex g(x) exists. 


We can now prove 


THEOREM (Von Neumann’s Minimax Theorem). Let A be anmxn 
payoff matriz, and let Am = {x E€ R™ : x > 0, ; z; = 1} and A, = {y E R”: 
yz 0,005 yj =1}. Then 


max min x’ Ay = min max x’ Ay. (2.14) 
XEAm yEAn yeA, XEAm 


As we discussed earlier, this quantity is called the value of the two-person 
zero-sum game with payoff matriz A. 


PROOF. The inequality 


max min x’ Ay < min max x’ Ay 
x€Am yEAn yeA, XEAm 


follows immediately from |Lemma 2.6.3] because f(x,y) = x? Ay is a continuous 
function in both variables and Am C R”, A, C R” are closed and bounded. 

To prove that the left-hand side of (2.14) is at least the right-hand side, suppose 
that 


A< min max x” Ay. (2.15) 
yeA, XEAm 


Define a new game with payoff matrix A given by Ĝi j = aij — A. For this new game 


0< min max x’ Ay. (2.16) 
yeA, XEAm 


Each mixed strategy y € A, for player II yields a gain vector Ay € R”. Let 
K denote the set of all vectors which dominate some gain vector Ay; that is, 


K = {Ayt+viye An, veR™,v>o}. 


The set K is convex and closed since A, is closed, bounded, and convex, and 
the set {v € R”, v > 0} is closed and convex. (See [Exercise 2.19}) Also, K cannot 
contain the 0 vector, because if 0 was in K, there would be some mixed strategy 
y € A, such that Ay < 0. But this would imply that MaxxcA,, x! Ay < 0, 
contradicting (2.16). 

Thus K satisfies the conditions of the Separating Hyperplane Theorem (Theo- 
rem 2.6.2), which gives us z € R™ and c > 0 such that z’w > c > 0 for allw € K. 
That is, 

z? (Ay + v) >c>0 for all y € A, and v > 0. (2.17) 

We claim that z > 0. If not, say z; < 0 for some j, then for v € R™ with vj 
sufficiently large and v; = 0 for all i 4 j, we would have zT (Ay+v) = zT Ay+z;0; < 
0 for some y € An, which would contradict (2.17). 
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It also follows from (2.17) that z #0. Thus s = )>\", z; is strictly positive, so 
x= iz, ...;, Zm)? = Z/s € Am satisfies xT Ay > c/s > 0 for all y € Ay. 
Therefore, minyea, XT Ay > A, whence 


max min x! Ay > À. 
x€Am yEAn 


Since this holds for every À satisfying (2.15), the theorem follows. 


2.7. Zero-sum games with infinite action spaces* 


THEOREM 2.7.1. Consider a zero-sum game in which the players’ action spaces 
are [0,1] and the gain is A(x, y) when player I chooses action x and player IT chooses 
action y. Suppose that A(x,y) is continuous on [0,1]?. Let A = Ajo,1] be the space 
of probability distributions on [0,1]. Then 


pax min | | A(c,y)dF(e)aG(y) = = min max | f A( (x,y)dF(x)dG(y). (2-18) 


FEA GEA GEA FEA 


PROOF. If there is a matrix (a;;) for which 


then (2.18) reduces to the finite case. If A is continuous, then there are functions 
Ao and Aj of the form (2.19) so that Aj < A < A; and |A; — Ao] < e. This implies 
(2.18) with infs and sups in place of min and max. The existence of the maxima 
and minima follows from compactness of Ajo,1] as in the proof of Lemma 


REMARK 2.7.2. The previous theorem applies in any setting where the action 
spaces are compact metric spaces and the payoff function is continuous. 


S EXERCISE 2.e. Two players each choose a number in [0,1]. If they choose the 
same number, the payoff is 0. Otherwise, the player that chose the lower number 
pays $1 to the player who chose the higher number, unless the higher number 
is 1, in which case the payment is reversed. Show that this game has no mixed 
Nash equilibrium. Show that the safety values for players I and II are -1 and 1, 
respectively. 


REMARK 2.7.3. The game from the previous exercise shows that the continu- 
ity assumption on the payoff function A(x, y) cannot be removed. See also 


Notes 


The theory of two-person zero-sum games was first laid out in a 1928 paper by John 
von Neumann [vN28], where he proved the Minimax Theorem (Theorem ??). The foun- 
dations were further developed in the book by von Neumann and Morgenstern [vNM53], 
first published in 1944. 

The original proof of the Minimax Theorem used a fixed point theorem. A proof based 
on the Separating Hyperplane Theorem was given by Weyl [Wey50], and 
an inductive proof was given by Owen . Subsequently, set other minimax theo- 
rems were proved, such as[Theorem 2.7.1] Tica ERE to Glicksberg [GI52], and Sion’s minimax 
theorem [Sio58]. An influential example of a zero-sum game on the unit square with dis- 
continuous payoff functions and without a value is in |5W57a|. Games of timing also have 
discontinuous payoff functions, but do have a value. See, e.g., [Gar00]. 
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daara sa an 


John von Neumann Oskar Morgenstern 


Given an m x n payoff matrix A, the optimal strategy for player I might be supported 
on all m rows. However, Lipton and Young [LY94| showed that (assuming 0 < ai; < 1 for 
all i, j), player I has an c-optimal strategy supported on k = E rows. This follows by 
sampling k rows at random from the optimal mixed strategy and applying the Hoeffding- 
Azuma Inequality (Theorem B.2.2). 

Exercise [2.2] is from |Kar59}. Exercise [2.17] comes from [HS89]}. Exercise [2.18] is an 
example of a class of recursive games studied in [Eve57]. 

More detailed accounts of the material in this chapter can be found in Ferguson |Fer08}, 
Karlin [Kar59], and Owen [Owe95], among others. 

In 82.4] we present techniques for simplifying and solving zero-sum games by hand. 
However, for large games, there are efficient algorithms for finding optimal strategies 
and the value of the game based on linear programming. A brief introduction to linear 
programming can be found in There are also many books on the topic 
including, for example, [MG07]. 

The Minimax Theorem played a key role in the development of linear programming. 
George Dantzig, one of the pioneers of linear programming, relays the following story 
about his first meeting with John von Neumann [Dan82]. 


FIGURE 2.6. Von Neumann explaining duality to Dantzig. 


On October 3, 1947, I visited him (von Neumann) for the first time at 
the Institute for Advanced Study at Princeton. I remember trying to 
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describe to von Neumann, as I would to an ordinary mortal, the Air 
Force problem. I began with the formulation of the linear program- 
ming model in terms of activities and items, etc. Von Neumann did 
something which I believe was uncharacteristic of him. “Get to the 
point,” he said impatiently. Having at times a somewhat low kindling- 
point, I said to myself “O.K., if he wants a quicky, then thats what he 
will get.” In under one minute I slapped the geometric and algebraic 
version of the problem on the blackboard. Von Neumann stood up 
and said “Oh that!” Then for the next hour and a half, he proceeded 
to give me a lecture on the mathematical theory of linear programs. 

At one point seeing me sitting there with my eyes popping and my 
mouth open (after I had searched the literature and found nothing), 
von Neumann said: “I don’t want you to think I am pulling all this 
out of my sleeve at the spur of the moment like a magician. I have 
just recently completed a book with Oskar Morgenstern on the theory 
of games. What I am doing is conjecturing that the two problems are 
equivalent. The theory that I am outlining for your problem is an 
analogue to the one we have developed for games.” Thus I learned 
about Farkas’ Lemma, and about duality for the first time. 


Exercises 


2.1. Show that all saddle points in a zero-sum game (assuming there is at least 
one) result in the same payoff to player I. 


2.2. Show that if a zero-sum game has a saddle point in every 2 x 2 submatrix, 
then it has a saddle point. 


2.3. Find the value of the following zero-sum game and determine some optimal 
strategies for each of the players: 


8 3 4 1 
4 7 1 6 
0 3 8 5 
2.4. Find the value of the zero-sum game given by the following payoff matrix, 
and determine some optimal strategies for each of the players: 
09 1 1 
5 0 6 7 
2 4 3 3 
2.5. Find the value of the zero-sum game given by the following payoff matrix 
and determine all optimal strategies for both players: 
3.0 
0 3 
2. 2 
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S 2.6. Given a 5 x 5 zero-sum game, such as the following, how would you quickly 
determine by hand if it has a saddle point: 


20 
2 
10 
5 
3 


“ID OWH 

oOrRwNA OO 

PNO KRW 

oN oO RR 
x) 


2.7: Give an example of a two-person zero-sum game where there are no pure 
Nash equilibria. Can you give an example where all the entries of the payoff 
matrix are different? 


2.8. Define a zero-sum game in which one player’s unique optimal strategy is 
pure and all of the other player’s optimal strategies are mixed. 


2.9. Player II is moving an important item in one of three cars, labeled 1, 2, 
and 3. Player I will drop a bomb on one of the cars of her choosing. She 
has no chance of destroying the item if she bombs the wrong car. If she 
chooses the right car, then her probability of destroying the item depends 
on that car. The probabilities for cars 1, 2, and 3 are equal to 3/4, 1/4, 
and 1/2. 

Write the 3 x 3 payoff matrix for the game, and find an optimal strat- 
egy for each player. 


2.10. Using the result of|Proposition 2.5.3} give an exponential time algorithm to 


solve an n x m two-person zero-sum game. Hint: Consider each possibility 
for which subset S' of player I strategies have x; > 0 and which subset T of 
player II strategies have y; > 0. 


2.11. Consider the following two-person zero-sum game. Both players simulta- 
neously call out one of the numbers {2,3}. Player I wins if the sum of the 
numbers called is odd and player II wins if their sum is even. The loser 
pays the winner the product of the two numbers called (in dollars). Find 
the payoff matrix, the value of the game, and an optimal strategy for each 
player. 


2.12. Consider the four-mile stretch of road shown in[Figure 2.7] There are three 
locations at which restaurants can be opened: Left, Central, and Right. 
Company I opens a restaurant at one of these locations and company II 
opens two restaurants (both restaurants can be at the same location). A 
customer is located at a uniformly random location along the four-mile 
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stretch. He walks to the closest location at which there is a restaurant and 
then into one of the restaurants there, chosen uniformly at random. The 
payoff to company I is the probability that the customer visits a company I 
restaurant. Determine the value of the game, and find some optimal mixed 
strategies for the companies. 


| lI 
Pee pe oe 


vvVN CINGIN 


Left Central Right 


1 mile 1 mile 1 mile 1 mile 


Customer is at uniformly random location 


FIGURE 2.7. Restaurant location game. 


2.13. Bob has a concession at Yankee Stadium. He can sell 500 umbrellas at 
$10 each if it rains. (The umbrellas cost him $5 each.) If it shines, he 
can sell only 100 umbrellas at $10 each and 1000 sunglasses at $5 each. 
(The sunglasses cost him $2 each.) He has $2500 to invest in one day, but 
everything that isn’t sold is trampled by the fans and is a total loss. 

This is a game against nature. Nature has two strategies: rain and 
shine. Bob also has two strategies: buy for rain or buy for shine. 

Find the optimal strategy for Bob assuming that the probability for 
rain is 50%. 


2.14. The Number Picking Game: Two players I and II pick a positive integer 
each. If the two numbers are the same, no money changes hands. If the 
players’ choices differ by 1, the player with the lower number pays $1 to 
the opponent. If the difference is at least 2, the player with the higher 
number pays $2 to the opponent. Find the value of this zero-sum game 
and determine optimal strategies for both players. (Hint: Use domination.) 


2.15. Show that in Submarine Salvo the submarine has an optimal strategy where 
all choices containing a corner and a clockwise adjacent site are excluded. 


2.16. A zebra has four possible locations to cross the Zambezi River; call them a, 
b, c, and d, arranged from north to south. A crocodile can wait (undetected) 
at one of these locations. If the zebra and the crocodile choose the same 
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location, the payoff to the crocodile (that is, the chance it will catch the 
zebra) is 1. The payoff to the crocodile is 1/2 if they choose adjacent 
locations, and 0 in the remaining cases, when the locations chosen are 
distinct and nonadjacent. 

(a) Write the payoff matrix for this game. 

(b) Can you reduce this game to a 2 x 2 game? 

(c) Find the value of the game (to the crocodile) and optimal strategies 

for both. 


S 2.17. Generalized Matching Pennies: Consider a directed graph G = (V, E) with 

nonnegative weights wij; on each edge (i, j). Let W; = >), wij. Each player 
chooses a vertex, say i for player I and 7 for player II. Player I receives a 
payoff of wi; if i A j and loses W; — wi if i = j. Thus, the payoff matrix 
A has entries aij = wij — 1gaj}Wi. If n = 2 and the w,;’s are all 1, this 
game is called Matching Pennies. 

e Show that the game has value 0. 

e Deduce that for some x € An, x’ A = 0. 


2.18. A recursive zero-sum game: Trumm Seafood has wild salmon on the menu. 
Each day, the owner, Mr. Trumm, decides whether to cheat and serve the 
cheaper farmed salmon instead. An inspector selects a day in 1,...,n and 
inspects the restaurant on that day. The payoff to the inspector is 1 if he 
inspects while Trumm is cheating. The payoff is —1 if the Trumm cheats 
and is not caught. The payoff is also —1 if the inspector inspects but 
Trumm did not cheat and there is at least one day left. This leads to the 
following matrices [,, for the game with n days: The matrix T4 is shown 
on the left, and the matrix T» is shown on the right. 


Trumm Trumm 
S cheat honest § cheat honest 
3 | inspect 1 0 Ə | inspect 1 —1 
Q š Q i 
z wait —1 0 A wait —1 Tai 


Find the optimal strategies and the value of Tn. 


S 2.19. Prove that if set G C R? is compact and H C R? is closed, then G + H 
is closed. (This fact is used in the proof of the Minimax Theorem to show 
that the set K is closed.) 


S 2.20. Find two closed sets F,, F> C R? such that F, — F> is not closed. 


S*2.21. Consider a zero-sum game A and suppose that 7 and ø are permutations of 
Is strategies {1,...,m} and player II’s strategies {1,...,n}, respectively, 
such that 

An(i)o(j) = ij (2.1) 
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for all i and j. Show that there exist optimal strategies x* and y* such 
thate; = Tr) for all ¿ and y} = Yo(3) for all j. 

S 2.22. Player I chooses a positive integer x > 0 and player II chooses a positive 
integer y > 0. The player with the lower number pays a dollar to the player 
with the higher number unless the higher number is more than twice larger 
in which case the payments are reversed. 


1 if y < x < 2y or x < y/2, 
A(z,y)=4-1 ifa<y<2rory< z/2, 
0 if z =y. 


Find the unique optimal strategy in this game. 


2.23. Two players each choose a positive integer. The player that chose the lower 
number pays $1 to the player who chose the higher number (with no pay- 
ment in case of a tie). Show that this game has no Nash equilibrium. Show 
that the safety values for players I and II are —1 and 1 respectively. 


2.24. Two players each choose a number in [0, 1]. Suppose that A(z, y) = |z — yl. 
e Show that the value of the game is 1/2. 
e More generally, suppose that A(x, y) is a convex function in each of 
x and y and that it is continuous. Show that player I has an opti- 
mal strategy supported on 2 points and player II has an optimal pure 
strategy. 


2.25. Consider a zero-sum game in which the strategy spaces are [—1, 1] and the 
gain of player I when she plays x and player II plays y is 
A(x, y) = log ee 
|z — y| 
Show that I picking X = cos ©, where © is uniform on [—1, 1], and II using 
the same strategy is a pair of optimal strategies. 
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CHAPTER 3 


Zero-sum games on graphs 


In this chapter, we consider a number of graph-theoretic zero-sum games. 


3.1. Games in series and in parallel 


EXAMPLE 3.1.1 (Hannibal and the Romans). Hannibal and his army (player 
I) and the Romans (player II) are on opposite sides of a mountain. There are two 
routes available for crossing the mountain. If they both choose the wide mountain 
pass, their confrontation is captured by a zero-sum game G1. If they both choose the 
narrow mountain pass, their confrontation is captured by zero-sum game Gp, with 
different actions and payoffs (e.g., the elephants cannot cross the narrow pass). If 
they choose different passes, no confrontation occurs and the payoff is 0. We assume 
that the value of both G and G2 is positive. The resulting game is a parallel-sum 
of Gy and Go. 


FIGURE 3.1. Hannibal approaching battle. 


In the second scenario, Hannibal has two separate and consecutive battles with 
two Roman armies. Again, each battle is captured by a zero-sum game, the first 
Gı and the second G2. This is an example of a series-sum game. 


DEFINITION 3.1.2. Given two zero-sum games G; and G2, their series-sum 
game corresponds to playing G; and then Gz. In a parallel-sum game, each 
player chooses either G; or G2 to play. If each picks the same game, then it is that 
game which is played. If they differ, then no game is played, and the payoff is zero. 


EXERCISE 3.a. Show that if G; has value v; for i = 1,2, then their series-sum 
game has value vı + v2. 


55 
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To solve for optimal strategies in the parallel-sum gum, we write a big payoff 
matrix, in which player Is strategies are the union of her strategies in G; and her 
strategies in G2 as follows: 


player II 
= pure strategies of G4 pure strategies of G2 
+ | pure strategies of Gy Gi 0 
®& | pure strategies of Go 0 Go 
A 


In this payoff matrix, we have abused notation and written G; and Gə inside the 
matrix to denote the payoff matrices of G and Go, respectively. If the two players 
play Gı and Gə optimally, the payoff matrix can be reduced to: 


player II 
5 play in G, play in G2 
5 | play in Gi vı 0 
F play in Gg 0 V9 
a 


Thus to find optimal strategies, the players just need to determine with what prob- 
ability they should play Gı and with what probability they should play Go. If 
both payoffs vı and v are positive, the optimal strategy for each player consists 
of playing G with probability v2/(vı + v2) and Gə with probability v,/(v, + v2). 
The value of the parallel-sum game is 


VU {V2 o 1 
Vy + V2 E 1/vı + 1/v2` 


Those familiar with electrical networks will note that the rules for computing 
the value of series or parallel games in terms of the values of the component games 
are precisely the same as the rules for computing the effective resistance of a pair 
of resistors in series or in parallel. In the next section, we explore a game that 
exploits this connection. 


3.1.1. Resistor networks and troll games. 


EXAMPLE 3.1.3 (Troll and Traveler). A troll (player I) and a traveler (player 
II) will each choose a route along which to travel from Syracuse (s) to Troy (t) and 
then they will disclose their routes. Each road has an associated toll. In each case 
where the troll and the traveler have chosen the same road, the traveler pays the 
toll to the troll. 


In the special case where there are exactly two parallel roads from A to B (or 
two roads in series), this is the parallel-sum (respectively, series-sum) game we saw 
earlier. For graphs that are constructed by repeatedly combining graphs in series 
or in parallel, there is an elegant and general way to solve the Troll and Traveler 
game, by interpreting the road network as an electrical network and the tolls as 
resistances. 


DEFINITION 3.1.4. An electrical network is a finite connected graph G with 


positive edge labels (representing edge resistances) and a specified source s and sink 
t. 
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FIGURE 3.2. The Troll and the Traveler. 


We combine two networks G1 and G2, with sources s; and sinks t;, for i = 1, 2, 
either in series, by identifying tı with s2, or in parallel, by identifying sı with 
s2, and tı with to. 

A network G is a series-parallel network if it is either a single directed 
edge from the source to the sink or it is obtained by combining two series-parallel 
networks G; and G2 in series or in parallel. 


See the upper left of for a graph constructed by combining two edges 
in series and in parallel, rite a more complex series-parallel graph. 

Recall that if two nodes are connected by a resistor with resistance R and there 
is a voltage drop of V across the two nodes, then the current that flows through the 
resistor is V/R. The conductance is the reciprocal of the resistance. When the pair 
of nodes are connected by a pair of resistors with resistances Rı and Rə arranged 
in series (see the top of |Figure 3.3), the effective resistance between the nodes 
is Rı + R2, because the current that flows through the resistors is V/(Ri + R2). 
When the resistors are arranged in parallel (see the bottom of |Figure 3.3), it is 
the conductances that add; i.e., the effective conductance between the nodes is 
1/Rı + 1/R2 and the effective resistance is 


1 Ri Rə 


1/Ri+1/R. Ri +R 


These series and parallel rules for computing the effective resistance can be 
used repeatedly to compute the effective resistance of any series-parallel network, 
as illustrated in|Figure 3.4| Applying this argument inductively yields the following 
claim. 


CLAIM 3.1.5. The value of the Troll and Traveler game played on a series- 
parallel network G with source s and sink t is the effective resistance between s and 
t. Optimal strategies in the Troll and Traveler game are defined as follows: If G 
is obtained by combining G1 and Ga in series, then each player plays his or her 
optimal strategy in G1 followed by his optimal strategy in G2. If G is obtained by 
combining G and Go (with sources 81,82 and sinks t,,t2) in parallel, then each 
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IUR +R) 


S t_,t 


51S, v2 
FIGURE 3.3. In a network consisting of two resistors with resistances Rı and 
Rə in series (shown on top), the effective resistance is Ri + R2. When the 
resistors are in parallel, the effective conductance is 1/Rı + 1/R2, so the 
effective resistance is 1/(1/R: + 1/R2) = RiR2/(Ri + R2). If these figures 
represent the roads leading from s to t in the Troll and Traveler game and the 
toll on each road corresponds to the resistance on that edge, then the effective 
resistance is the value of the game, and the optimal strategy for each player 
is to move along an edge with probability proportional to the conductance on 
that edge. 


1 1 
aN = e.. 
> > 
1 1 1 1/2 
e 
e >00 
3/2 


3/5 


FIGURE 3.4. A resistor network, with resistances all equaling to 1, has an 
effective resistance of 3/5. Here the parallel rule was used first, then the series 
rule, and then the parallel rule again. 


player plays his optimal strategy in Gi with probability Ci/(C1 + C2), where C; is 
the effective conductance between s; and t; in Gi. 


3.2. Hide and Seek games 


EXAMPLE 3.2.1 (Hide and Seek). A robber, player II, hides in one of a set 
of safehouses located at certain street/avenue intersections in Manhattan. A cop, 
player I, chooses one of the avenues or streets to travel along. The cop wins a unit 
payoff if she travels on a road that intersects the robber’s location and nothing 
otherwise. 


We represent this situation with a 0/1 matrix H where h;; = 1 if there is a 
safehouse at the intersection of street i and avenue j, and h;; = 0 otherwise. The 
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Ave. 2”1Ave. 3™Ave. 4t} Ave. 5° Ave. 


Robber 


FIGURE 3.5. The figure shows an example scenario for the Hide and Seek 
game. In this example, the robber chooses to hide at the safehouse at the 
intersection of 2nd St. and 4th Ave., and the cop chooses to travel along 1st 
St. Thus, the payoff to the cop is 0. 


following matrix corresponds to Figure [3.5} 
0 0 1 0 
0 0 1 1 
0 0 1 0 

The cop’s actions correspond to choosing a row or column of this matrix and 
the robber’s actions correspond to picking a 1 in the matrix. 

Clearly, it is useless for the cop to choose a road that doesn’t contain a safe- 
house; a natural strategy for her is to find a smallest set of roads that contain 
all safehouses and choose one of these at random. Formally, a line-cover of the 
matrix H is a set of lines (rows and columns) that cover all nonzero entries of H. 
The proposed cop strategy is to fix a minimum-sized line-cover C and choose one 
of the lines in C uniformly at random. This guarantees the cop an expected gain of 
at least 1/|C| against any robber strategy. 

Next we consider robber strategies. A vulnerable strategy would be to choose 
from among a set of safehouses that all lie on the same road. The “opposite” of 
that is to find a maximum-sized set M of safehouses, where no two lie on the same 
road, and choose one of these uniformly at random. This guarantees that the cop’s 
expected gain is at most 1/|M]. 

It is not obvious that the proposed strategies are optimal. However, in the next 
section, we prove that 


IC] = |M]. (3.1) 


This implies that the proposed pair of strategies is jointly optimal for Hide and 
Seek. 


3.2.1. Maximum matching and minimum covers. Given a set of boys 
B and a set of girls G, draw an edge between a boy and a girl if they know each 
other. The resulting graph is called a bipartite graph since there are two disjoint 
sets of nodes and all edges go between them. Bipartite graphs are ubiquitous. 
For instance, there is a natural bipartite graph where one set of nodes represents 
workers, the other set represents jobs, and an edge from worker w to job 7 means 
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that worker w can perform job j. Other examples involve customers and suppliers, 
or students and colleges. 

A matching in a bipartite graph is a collection of disjoint edges, e.g., a set of 
boy-girl pairs that know each other, where every individual occurs in at most one 
pair. (See Figure [3-6]) 

Suppose |B| < |G|. Clearly there cannot be a matching that includes more than 
|B| edges. Under what condition is there a matching of size |B], i.e. a matching in 
which every boy is matched to a girl he knows? 


FIGURE 3.6. On the left is a bipartite graph where an edge between a boy and 
a girl means that they know each other. The edges in a matching are shown 
by bold lines in the figure on the right. 


An obvious necessary condition, known as Hall’s condition, is that each sub- 
set B’ of the boys collectively knows enough girls, at least |B’| of them. Hall’s 
Marriage Theorem asserts that this condition is also sufficient. 


THEOREM 3.2.2 (Hall’s Marriage Theorem). Suppose that B is a finite set 
of boys and G is a finite set of girls. For any particular boy b € B, let f(b) denote 
the set of girls that b knows. For a subset B' C B of the boys, let f(B’) denote 
the set of girls that boys in B’ collectively know; i.e., f(B') = Usep, f(b). There is 
a matching of size |B| if and only if Hall’s condition holds: Every subset B' C B 
satisfies | f(B’)| > |B’. 


PROOF. We need only prove that Hall’s condition is sufficient, which we do by 
induction on the number of boys. The case |B| = 1 is straightforward. For the 
induction step, we consider two cases: 

Case 1: |f(B’)| > |B’| for each nonempty B’ G B. Then we can just match an 
arbitrary boy b to any girl he knows. The set of remaining boys and girls still 
satisfies Hall’s condition, so by the inductive hypothesis, we can match them up. 
Case 2: There is a nonempty B’ © B for which |f(B’)| = |B’|. By the inductive 
hypothesis, there is a matching of size |B’| between B’ and f(B’). Once we show 
that Hall’s condition holds for the bipartite graph between B \ B’ and G \ f(B’), 
another application of the inductive hypothesis will yield the theorem. 

Suppose Hall’s condition fails; i.e., there is a set A of boys disjoint from B’ 
such that the set S = f(A) \ f(B’) of girls they know outside f(B’) has |S] < |A]. 


(See |Figure 3.7|) Then 


|f(AU B)| = |SU f(B)| < |A] + |B, 


violating Hall’s condition for the full graph, a contradiction. 
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boys girls 


FIGURE 3.7. Hall’s Marriage Theorem: Case 2 of the inductive argument. 
By hypothesis there is a matching of size |B’| between B’ and f(B’). If 
|S| =| f(A) \ f(B’)| < |A| for A C B \ B’, then the set AU B’ violates Hall’s 


condition. 


As we saw earlier, a useful way to represent a bipartite graph whose edges go 
between vertex sets I and J is via its adjacency matrix H. This is a 0/1 matrix 
where the rows correspond to vertices in J, the columns to vertices in J, and hy; = 1 
if and only if there is an edge between i and j. Conversely, any 0/1 matrix is the 
adjacency matrix of a bipartite graph. A set of pairs S C I x J is a matching for 
the adjacency matrix H if h,; = 1 for all (i, j) E€ S and no two elements of S are in 
the same row or column. This corresponds to a matching between J and J in the 
graph represented by H. 

For example, the following matrix is the adjacency matrix for the bipartite 
graph shown in Figure [3.6] with the edges corresponding to the matching in bold 
red in the matrix. (Rows represent boys from top to bottom and columns represent 
girls from left to right.) 


1 1 0 0 
0 1 0 0 
1 0 0 1 
0 0 1 0 


We restate Hall’s Marriage Theorem in matrix language and in graph language. 


THEOREM 3.2.3 (Hall’s Marriage Theorem — matrix version). Let H be 
anm xn nonnegative 0/1 matrix withm <n. Given a set S of rows, say column j 
intersects S positively if hj; = 1 for some i E€ S. Suppose that for any set S of rows 
in H, there are at least |S| columns in H that intersect S positively. Then there is 
a matching of sizem in H. 


THEOREM 3.2.4 (Hall’s Marriage Theorem — graph version). Let G = 
(U,V, E) be a bipartite graph, with |U| = m, |V| = n, with m < n. Suppose that 
the neighborhood [þf each subset of vertices S C U has size at least |S|. Then there 
is a matching of size m in G. 


1 The neighborhood of a set S of vertices in a graph is {v| Ju € S such that (u,v) € E}. 
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LEMMA 3.2.5 (K6nig’s lemma). Given an m x n 0/1 matrix H, the size of 
the maximum matching is equal to the size of the minimum line-cover{?| 


PROOF. Suppose the maximum matching has size k and the minimum line- 
cover C has size £. At least one member of each pair in the matching has to be in 
C and therefore k < £. 

For the other direction, we use Hall’s Marriage Theorem. Suppose that there 
are r rows and c columns in the minimum line-cover C, so r +c = l. Let S bea 
subset of rows in C, and let T be the set of columns outside C that have a positive 
intersection with some row of S. Then (C \ S$) UT is also a line-cover, so by the 
minimality of C, we have |T| > |S|. Thus, Hall’s condition is satisfied for the rows 
in C and the columns outside C, and hence there is a matching M of size r in this 
submatrix. Similarly, there is a matching M’ of size c in the submatrix defined by 
the rows outside C and the columns in C. Therefore, M U M’ is a matching of size 


at least Z, and hence £ < k, completing the proof. See|Figure 3.8 


c columns T 


FIGURE 3.8. An illustration of the last part of the proof of[Lemma 3.2.5| The 
first r rows and c columns in the matrix are in the cover C. If T, as defined 
in the proof, was smaller than S, this would contradict the minimality of C. 


COROLLARY 3.2.6. For the Hide and Seek game, an optimal strategy for the 
cop is to choose uniformly at random a line in a minimum line-cover. An optimal 
strategy for the robber is to hide at a uniformly random safehouse in a maximum 
matching. 


3.3. A pursuit-evasion game: Hunter and Rabbit* 


Consider the following gamd? A hunter (player I) is chasing a rabbit (player II). 
At every time step, each player occupies a vertex of the cycle Z,. At time 0, the 


2 This is also called a cover or a vertex cover. 
3 This game was first analyzed in |ARS*03]; the exposition here follows, almost verbatim, 
the paper [BPP 14l]. 
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hunter and rabbit choose arbitrary initial positions. At each subsequent step, the 
hunter may move to an adjacent vertex or stay where she is; simultaneously, the 
rabbit may stay where he is or jump to any vertex on the cycle. The hunter captures 
the rabbit at time t if both players occupy the same vertex at that time. Neither 
player can see the other’s position unless they occupy the same vertex. The payoff 
to the hunter is 1 if she captures the rabbit in the first n steps, and 0 otherwise. 


~ 


FIGURE 3.9. The hunter and the rabbit. 


Clearly, the value of the game is the probability pn of capture under optimal 
play. 
Here are some possible rabbit strategies: 
e Ifthe rabbit chooses a random node and stays there, the hunter can sweep 
the cycle and capture the rabbit with probability 1. 
e If the rabbit jumps to a uniformly random node at every step, he will be 
caught with probability 1/n at each step. Thus, the probability of capture 
in n steps is 1 — (1 — 1/n)” —> 1 — 1/e as n > œ. 


The Sweep strategy for the hunter consists of choosing a uniformly random 
starting point and a random direction and then walking in that direction. A rabbit 
counterstrategy is the following: From a random starting node, walk y/n steps to the 
right, then jump 2,/n steps to the left; repeat. Figure[3.10|shows a representation of 
the rabbit counterstrategy to Sweep. Consider the space-time integer lattice, where 
the vertical coordinate represents time t and the horizontal coordinate represents 
the position x on the circle. Sincem during the n steps, the rabbit’s space-time 
path will intersect @(,/n) diagonal lines (lines of the form x = t+7 mod n) and the 
hunter traverses exactly one random diagonal line in space-time, the probability of 
capture is @(1/./n). In fact, the Sweep strategy guarantees the hunter a probability 
of capture Q(1/,/n) against any rabbit strategy. (See Exercise [3.6}) 

It turns out that, to within a constant factor, the best the hunter can do 
is to start at a random point and move at a random speed. We analyze this 
strategy in the next section and show that it increases the probability of capture 


to O(1/log(n)). 
3.3.1. Towards optimal strategies. Let H; be the position of the hunter 
at time t. Then the set of pure strategies H available to her are 


H = GA ar 3 H, € Zn; Hizi = H,| < 1}. 
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time 


~_____ space — 


FIGURE 3.10. The figure shows a typical path in space-time for a rabbit em- 
ploying the counterstrategy to Sweep. 


Similarly, if R; is the position of the rabbit at time t, then R is the set of pure 
strategies available to him: 


R = {(R g : Ri € Zn}. 


If V is the value of the game, then there exists a randomized strategy for the 
hunter so that against every strategy of the rabbit the probability that they collide 
in the first n steps is at least V; and there exists a randomized strategy for the 
rabbit, so that against every strategy of the hunter, the probability that they collide 
is at most V. 


THEOREM 3.3.1. There are positive constants c and c such that 
/ 
oye 


: 2 
logn~ ~ logn (32) 


3.3.2. The hunter’s strategy. Consider the following Random Speed strat- 
egy for the hunter: Let a,b be independent random variables uniformly distributed 
on [0,1] and define H; = |an + bt| mod n fr 0< t< n. 


PROPOSITION 3.3.2. If the hunter employs the Random Speed strategy, then 
against any rabbit strategy, 
c 


P(capture) > E 


where c is a universal positive constant. 


Proof. Let R, be the location of the rabbit on the cycle at time t; i.e., Ry € 


{0,...,n—1}. Denote by Kn the number of collisions before time n; i.e., Kn = 
ro Ta, where 
l = Li H,=R:} = {an + bt E [Ri, Ri + 1) U [Ri + n, R, + n + 1)}. (3.3) 
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By Lemma |B.1.1| we have 


(E[K,,])? 

P(K, >. A 
(Ky > 0) > So (3.4) 
For any fixed b and t, the random variable (an + bt) mod n is uniformly distributed 
in [0,n). Therefore |an + bt] mod n is uniform on {0,...,n — 1}, so P(;) = 1/n. 
Thus, 


[Ka] = > PH) =1. (3.5) 


Next, we estimate the second moment: 


[K2] =E (£1) =E[K,]+ C ELAN Im] 
t=0 


tm 


n—ln—t—i1 


1+2 XO P(A Tag). (3.6) 


t=0 j=1 
To bound P(I; N Iy4;), observe that for any r,s, the relations an + bt € [r,r + 1) 
and an + b(t + j) € [s,s + 1) together imply that bj € [s — r— 1,s—r + 1], so 
P(an + bt € [r,r +1) and an + b(t + j) € [s,s + 1)) 


2 1 
<P(bj € [s-—r—-l1,s—r4 1]) - max P(an + bt € Ir,r+1)|b)<---. 
jon 
Summing over r € {R;, n+ Ri} and s € {Ri+;, n + Ri+j}, we obtain 
8 
jn 
Plugging back into (3-6), we obtain 
n—l n 
8 
K? <142 — <Clogn, 3.7 
[Kr] 2 2 a (3.7) 


for a positive constant C. Combining (3.5) and (3.7) using (3.4), we obtain 


P(capture) = P(K, > 0) > Glenn 


completing the proof of the proposition. 


3.3.3. The rabbit’s strategy. In this section we prove the upper bound in 
Theorem [3.3.1] by constructing a randomized strategy for the rabbit. It is natural 
for the rabbit to try to maximize the uncertainty the hunter has about his location. 
Thus, he will choose a strategy in which, at each point in time, the probability 
that he is at any particular location on the cycle is 1/n. With such a strategy, the 
expected number of collisions of the hunter and the rabbit over the n steps is 1. 
However, the rabbit’s goal is to ensure that the probability of collision is low; this 
will be obtained by concentrating the collisions on a small number of paths. 

As before, let K; be the number of collisions during the first t time steps. As 
a computational device, we will extend both players’ strategies to 2n steps. Since 


{| Kon] = {| Kon|Kn 2 0] i PS > 0) 
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and E[Kən] = 2, we have 

2 
E[Kon|Kn > 0) 
To keep P(Kn > 0) small, we will construct the rabbit’s strategy so that E[Kən|Kn > 0] 


is large; that is, given that the rabbit and the hunter meet in the first n steps, the 
expected number of meetings in 2n steps is large. 


P(Kn > 0) < (3.8) 


space 


FIGURE 3.11. The figure shows how the random walk determines Ri, from Rz. 


(tt) (kt) (tt) 


(0,0) 


time 


(t-t) (t-t) 


space —— >» 


FIGURE 3.12. The figure shows a random walk escaping the 2t x 2t square. 
Since the walk first hits the line y = t at position (k, t), at time t, the rabbit 
is in position k. 


The strategy is easiest to describe via the following thought experiment: Again, 
draw the space-time integer lattice, where the y-coordinate represents time and the 
x-position represents the position on the circle. We identify all the points (z +in, y) 
for integer i. Suppose that at time t, the rabbit is at position R;. Execute a simple 
random walk on the 2D lattice starting at (x,y) = (R;,t). At the next step, the 
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rabbit jumps to Ri41, where (R:+1,t + 1) is the first point at which the random 
walk hits the line y = t + 1. See Figure 


LEMMA 3.3.3. There is a constant c > 0 such that for all £ € [-t, t] 


P(R, = (Ro + £) mod n) > a (3.9) 


FIGURE 3.13. The figure illustrates the reflection principle. The purple 
(darker) path represents the random walk. The pink (lighter) line is the re- 
flection of that path across the line x = k/2 starting from the first time the 
random walk hits the line. 


PROOF. Let (R;,t) be the first point that a simple random walk starting at 
(0,0) hits on the line y = t. Let S; be the square [—t,¢]?. First, since a random 
walk starting at (0,0) is equally likely to exit this square on each of the four sides, 


we have 
1 
P(random walk exits S; on top) = Zr (3.10) 
(See |Figure 3.12|) Therefore 
t 
~ 1 
y P(Ř = E) > 5. (3.11) 
k=-t 
Next, we show that 
P(R, = 0) > P(R, = E) (3.12) 


for all k € [-t,t]. First consider the case where k is even. An application of the 
reflection principle (see [Figure 3.13) shows that for each path from [0,0] to [k, ¢] 
there is another equally likely path from [0,0] to [0, t]. To handle the case of k odd, 
we extend our lattice by adding a mid-point along every edge. This slows down the 
random walk, but does not change the distribution of R, and allows the reflection 
argument to go through. 
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Inequalities (3.11) and (3.12) together imply that 


P(R, = 0) > a a (3.13) 


To prove that there is a constant c such that 
P(Ř =k) ><, 


consider the smaller square from —k to k. With probability 1/4, the random walk 
first hits the boundary of the square on the right side. If so, run the argument used 
to show (3.13) starting from this hitting point. See Figure 


(k,t) 


Ck,k) 


time 


Ck-k) 


r space z 


FIGURE 3.14. The figure shows a random walk escaping the 2k x 2k square. 


COROLLARY 3.3.4. There is a constant c such that 
i[Kon|Kn > 0] > clog(n). 


PROOF. Suppose that the hunter and the rabbit both start out at position 
(0,0) together. Then the position of the hunter on the cycle at time t must be in 
{—t, t}. The lemma then implies that the probability of a collision at time t is at 
least c/t. Thus, in T steps, the expected number of collisions is at least clog(T). Let 
F; denote the event that the hunter and the rabbit first collide at time t. Observing 
that J o<tcn P(Fi| Kn > 0) = 1, we have 


[Kon|Kn > 0) = XO E[Kon|Fi] P(FilKn > 0) 
O0<t<n 
XO clog (2n — t)P(Fi|Kn > 0) 


O0<t<n 


IV 


IV 


clog n. 


Substituting the result of this corollary into (3.8) completes the proof of the 
upper bound in Theorem 
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3.4. The Bomber and Battleship game 


In this family of games, a battleship is initially located at the origin in Z. 
At each time step in {0,1,...}, the ship moves either left or right to a new site 
where it remains until the next time step. The bomber (player I), who can see 
the current location of the battleship (player II), drops one bomb at some time j 
over some site in Z. The bomb arrives at time j + 2 and destroys the battleship if 
it hits it. (The battleship cannot see the bomber or the bomb in time to change 
course.) For the game Gn, the bomber has enough fuel to drop its bomb at any 
time j € {0,1,...,n}. What is the value of the game? 


EXERCISE 3.b. (i) Show that the value of Go is 1/3. (ii) Show that the value 
of G; is also 1/3. (iii) Show that the value of G2 is greater than 1/3. 


-3 -2 -1 0 1 2 3 


: P s \ aan 
time 1 — ae 


time 2 a “eo 


time 3 ë ae we My 


FIGURE 3.15. The bomber drops her bomb where she hopes the battleship 
will be two time units later. In the picture the bomber is at position 1 at 
time 1 and drops a bomb. The battleship does not see the bomb coming and 
randomizes his path to avoid the bomb, but if he is at position 1 at time 3, 
he will be hit. 


Consider the following p-reversal strategy for Gn. On the first move, go left 
with probability p and right with probability 1 — p. From then on, at each step 
reverse direction with probability 1 — p and keep going with probability p. 

The battleship chooses p to maximize the probability of survival. Its probabil- 
ities of arrival at sites —2, 0, or 2 at time 2 are p°, 1—p, and p(1—p). Thus, p will 
be chosen so that max{p?°, 1 — p} is minimized. This is attained when p? = 1 — p, 
whose solution in (0,1) is given by p = 2/(1 + v5). For any time j that the bomber 
chooses to drop a bomb, the battleship’s relative position two time steps later has 
the same distribution. Therefore, the payoff for the bomber against this strategy 
is at most 1 — p, so v(G,) < 1 — p for every n. While there is no value of n for 
which the p-reversal strategy is optimal in Gn, it is asymptotically optimal, i.e. 
limno 0(Gn) = 1 — p = (V5 — 1)/(V5 + 1). See the notes. 


Notes 


A generalization of the Troll and Traveler game from |§3.1.1] can be played on an 
arbitrary (not necessarily series-parallel) undirected graph with two distinguished vertices 
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s and t: If the troll and the traveler traverse an edge in the same direction, the traveler 
pays the cost of the road to the troll, whereas if they traverse a road in opposite directions, 
then the troll pays the cost of the road to the traveler. If we interpret the cost of each 
road e as an edge resistance Re, then the value of the game turns out to be the effective 
resistance between s and t: There is a unique unit flow F from s to t (called the unit 
current flow) that satisfies the cycle law }>, ReFe = 0 along any directed cycle. This flow 
can be decomposed into a convex combination of path flows from s to t. Let p, be the 
weight for path y in this convex combination. Then an optimal strategy for each player 
is to choose y with probability p,. For more details on effective resistance and current 
flows, see, e.g., [DS84 [LP WO09]. 

The Hide and Seek game in[§3.2|comes from [vN53]. The theory of maximum matching 
and minimum covers pe developed by Frobenius [Fro17] and König [Kön31], 
and rediscovered by Hall |Hal35|. For a detailed history, see Section 16.7h in Schri- 
jver |Sch03], and for a detailed exposition of these topics, see, e.g., [LP09al [vLW0O1] (Sch03}. 

As noted in the Hunter and Rabbit game discussed there was first analyzed 
in [ARS* 03}; our exposition follows verbatim the paper [BPPt 14]. 

An interesting open problem is whether there is a hunter strategy that captures a 
weak rabbit (that can only jump to adjacent nodes) in n steps with constant probability 
on any n vertex graph. 

The Hunter and Rabbit game is an example of a search game. These games are the 
subject of the book [AG03]. See also the classic [Isa65]. 

The Bomber and Battleship game of was proposed by R. Isaacs [Isa55|, who de- 
vised the battleship strategy discussed in the text. The value of the game was determined 


by Dubins [Dub57] and Karlin [Kar57]. 
is from Kleinberg and Tardos [KT06]. is from [vN53}. 


Exercises 


3.1. Solve Troll and Traveler on the graph in|Figure 3.16]assuming that the toll 
on each edge is 1. 


FIGURE 3.16. A series-parallel graph. 


3.2. Prove that every k-regular bipartite graph has a perfect matching. (A bi- 
partite graph is k-regular if it is n x n, and each vertex has exactly k edges 
incident to it.) 
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3.3. Let M be the 0-1 matrix corresponding to a bipartite graph G. Show that 
if there is no perfect matching in M, then there is a set I of rows of M and 
a set J of columns of M such that |I| + |J| > n and there is no edge from 
Ito J. 


3.4. Birkhoff-von Neumann Theorem: Prove that every doubly stochastic 
n x n matrix is a convex combination of permutation matrices. 
Hint: Use Hall’s Marriage Theorem and induction on the number of nonzero 
entries. 


3.5. Let G be an n x n bipartite graph, where vertices on one side correspond 
to actors, vertices on the other side correspond to actresses, and there 
is an edge between actor 7 and actress j if they have starred in a movie 
together. Consider a game where the players alternate naming names, with 
player I naming an actor 7 from the left side of the graph, then player II 
naming an actress j that has starred with i from the right side, then player 
I naming an actor 7’ from the left side that has starred with j, and so on. 
No repetition of names is allowed. The player who cannot respond with 
a new name loses the game. For an example, see Show that 
II has a winning strategy if the graph G has a perfect matching, and I 
has a winning strategy otherwise. Hint: Player I’s winning strategy in the 
latter case is to find a maximum matching in the graph, and then begin by 
naming an actor that is not in that maximum matching. 


Johnny Depp Angelina Jolie 
Brad Pitt Penelope Cruz 
Tom Cruise Nicole Kidman 


FIGURE 3.17. In this game, if player I names Johnny Depp, then player II 
must name either Angelina Jolie or Penelope Cruz. In the latter case, player 
I must then name Tom Cruise, and player II must name Nicole Kidman. At 
this point, player I cannot name a new actor and loses the game. 


3.6. Show that the Sweep strategy described in guarantees the hunter a 
probability of capture Q(1/,/n) against any rabbit strategy. Hint: Project 
the space-time path of the rabbit on both diagonals. One of these projec- 
tions must have size Q(yn). 


3.7. Prove that the following hunter strategy also guarantees probability of cap- 
ture Q(1/log(n)). Pick a uniform u € [0,1] and a random starting point. 
At each step, walk to the right with probability u, and stay in place other- 
wise. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


72 3. ZERO-SUM GAMES ON GRAPHS 


3.8. Suppose that the Hunter and Rabbit game runs indefinitely. Consider the 
zero-sum game in which the gain to the rabbit is the time until he is cap- 
tured. Show that the hunter has a strategy guaranteeing that the expected 
capture time is O(nlogn) and that the rabbit has a strategy guaranteeing 
that the expected capture time is is Q(n logn). 


3.9. Consider the Hunter and Rabbit game on an arbitrary undirected n-vertex 
graph G. Show that there is a hunter strategy guaranteeing that 


P(capture in n steps) > È 
logn 


where c > 0 is an absolute constant. Hint: Construct a spanning tree of 
the graph and then reduce to the cycle case by traversing the spanning tree 
in a depth-first order. 


*3.10. Show that for the Hunter and Rabbit game on a yn x yn grid, 
P(capture in n steps) > c > 0 


for some hunter strategy. Hint: Random direction, random speed. 


3.11. Show that a weak rabbit that can only jump to adjacent nodes will be 
caught in n steps on the n-cycle with probability at least c > 0 by a sweep- 
ing hunter. 


3.12. A pirate hides a treasure somewhere on a circular desert island of radius r. 
The police fly across the island k times in a straight path. They will locate 
the treasure (and win the game) if their path comes within distance w of it. 
See [Figure 3.18] Find the value of the game and some optimal strategies. 
Hint: Archimedes showed that the surface area on a sphere in R? that lies 
between two parallel planes that intersect the sphere is proportional to the 
distance between the planes. Verify this (if you are so inclined) and use it 
to construct an optimal strategy for the thief. 


3.13. Set up the payoff matrix for the Bomber and Battleship game from 
and find the value of the game G2. 
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xX 


FIGURE 3.18. Plane criss-crossing a desert island. 
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General-sum games 


We now turn to the theory of general-sum games. Such a game is specified 
by two matrices A = (a;;) and B = (b;;). If player I chooses action i and player II 
chooses action j, their payoffs are a;; and b;;, respectively. In contrast to zero-sum 
games, there is no reasonable definition of “optimal strategies”. Safety strategies 
still exist, but they no longer correspond to equilibria. The most important notion 
is that of a Nash equilibrium, i.e., a pair of strategies, one per player, such that 
each is a best response to the other. General-sum games and the notion of Nash 
equilibrium extend naturally to more than two players. 


4.1. Some examples 


EXAMPLE 4.1.1 (Prisoner’s Dilemma). Two suspects are imprisoned by the 
police who ask each of them to confess. The charge is serious, but there is not 
enough evidence to convict. Separately, each prisoner is offered the following plea 
deal. If he confesses and the other prisoner remains silent, the confessor goes free, 
and his confession is used to sentence the other prisoner to ten years in jail. If both 
confess, they will both spend eight years in jail. If both remain silent, the sentence 
is one year to each for the minor crime that can be proved without additional 
evidence. 


FIGURE 4.1. Two prisoners considering whether to confess or remain silent. 


74 
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The following matrix summarizes the payoffs, where negative numbers represent 
years in jail, and an entry (—10,0) means payoff —10 to prisoner I and 0 to prisoner 
II. 

prisoner IT 
silent confess 
silent | (—1,—1) (—10,0) 
confess | (0,—10) (—8,—8) 


prisoner I 


In this game, the prisoners are better off if both of them remain silent than they are 
if both of them confess. However, the two prisoners select their actions separately, 
and for each possible action of one prisoner, the other is better off confessing; i.e., 
confessing is a dominant strategy. 

The same phenomenon occurs even if the players play this game a fixed number 
of times. This can be shown by a backwards induction argument. (See[Exercise 4.4]) 
However, as we shall see in [§6.4} if the game is played repeatedly, but play ends at 
a random time, then the mutually preferable solution may become an equilibrium. 


EXAMPLE 4.1.2 (Stag Hunt). Two hunters are following a stag when a hare 
runs by. Each hunter has to make a split-second decision: to chase the hare or to 
continue tracking the stag. The hunters must cooperate to catch the stag, but each 
hunter can catch the hare on his own. (If they both go for the hare, they share it.) 
A stag is worth four times as much as a hare. This leads to the following payoff 
matrix 

Hunter II 
Stag (S) Hare (H) 
Stag (S) | (4, 4) (0, 2) 


Hunter I 


Hare (H) | (2, 0) (1, 1) 

What are good strategies for the hunters? We begin by considering safety 
strategies[] For each player, H is the unique safety strategy and yields a payoff of 
1. The strategy pair (H, H) is also a pure Nash equilibrium, since given the choice 
by the other hunter to pursue a hare, a hunter has no incentive to continue tracking 
the stag. There is another pure Nash equilibrium, (S, S), which yields both players 
a payoff of 4. Finally, there is a mixed Nash equilibrium, in which each player 
selects S with probability 1/3. This results in an expected payoff of 4/3 to each 
player. 

This example illustrates a phenomenon that doesn’t arise in zero-sum games: 
a multiplicity of equilibria with different expected payoffs to the players. 


EXAMPLE 4.1.3 (War and Peace). Two countries in conflict have to decide 
between diplomacy and military action. One possible payoff matrix is: 

Firm IT 
diplomacy attack 
P diplomacy (2, 2) (-2, 0) 
H attack (0, -2) (-1, -1) 


1A safety strategy for player I is defined as in|Definition 2.2.1] For player II the same 


definition applies with the payoff matrix B replacing A. 
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FIGURE 4.2. Stag Hunt. 


Like Stag Hunt, this game has two pure Nash equilibria, where one arises from 
safety strategies, and the other yields higher payoffs. In fact, this payoff matrix is 
the Stag Hunt matrix, with all payoffs reduced by 2. 


EXAMPLE 4.1.4 (Driver and Parking Inspector). Player I is choosing be- 
tween parking in a convenient but illegal parking spot (payoff 10 if she’s not caught) 
and parking in a legal but inconvenient spot (payoff 0). If she parks illegally and is 
caught, she will pay a hefty fine (payoff —90). Player II, the inspector representing 
the city, needs to decide whether to check for illegal parking. There is a small cost 
(payoff —1) to inspecting. However, there is a greater cost to the city if player I has 
parked illegally since that can disrupt traffic (payoff —10). This cost is partially 
mitigated if the inspector catches the offender (payoff —6). 

The resulting payoff matrix is the following: 


Inspector 
Don’t Inspect Inspect 
© Legal (0, 0) (0, —1) 
S Illegal (10, —10) (—90, —6) 


In this game, the safety strategy for the driver is to park legally (guaranteeing 
her a payoff of 0), and the safety strategy for the inspector is to inspect (guarantee- 
ing him/the city a payoff of —6). However, the strategy pair (legal, inspect) is not 
a Nash equilibrium. Indeed, knowing the driver is parking legally, the inspector’s 
best response is not to inspect. It is easy to check that this game has no Nash 
equilibrium in which either player uses a pure strategy. 

There is, however, a mixed Nash equilibrium. Suppose that the strategy pair 
(x,1 — x) for the driver and (y,1-— y) for the inspector is a Nash equilibrium. If 
0 < y < 1, then both possible actions of the inspector must yield him the same 
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payoff. If, for instance, inspecting yielded a higher payoff, then (0,1) would be 
a better strategy than (y,1 — y). Thus, —10(1 — x) = —x — 6(1 — x). Similarly, 
0 < x < 1 implies that 0 = 10y — 90(1 — y). These equations yield x = 0.8 (the 
driver parks legally with probability 0.8 and obtains an expected payoff of 0) and 
y = 0.9 (the inspector inspects with probability 0.1 and obtains an expected payoff 
of —2). 


4.2. Nash equilibria 


A two-person general-sum game can be represented by a pait| of m x n payoff 
matrices A = (a;;) and B = (b;;), whose rows are indexed by the m possible 
actions of player I and whose columns are indexed by the n possible actions of 
player II. Player I selects an action 7 and player IT selects an action j, each unaware 
of the other’s selection. Their selections are then revealed and player I receives a 
payoff of a;j and player II receives a payoff of b,;. 

A mixed strategy for player I is determined by a vector (71,...,%m)! where 
x; represents the probability that player I plays action i and a mixed strategy for 
player II is determined by a vector (y1,..-, Yn)’ where yj is the probability that 
player II plays action j. A mixed strategy in which a particular action is played 
with probability 1 is called a pure strategy. 


DEFINITION 4.2.1 (Nash equilibrium). A pair of mixed strategy vectors 
(x*, y*) with x* € Am (where Am = {x € R” : x; > 0,07", £i = 1}) and y* € A, 
is a Nash equilibrium if no player gains by unilaterally deviating from it. That 
is, 

(x*)7 Ay* > x7 Ay* 
for allx € A,, and 
(x*)? By* > (x*)? By 
for ally € An. 

The game is called symmetric ifm = n and a; j = b;, for alli, j € {1,2,...,n}. 

A pair (x,y) of strategies in A,, is called symmetric if x; = y; for alli =1,...,n. 


One reason that Nash equilibria are important is that any strategy profile that 
is not a Nash equilibrium is, by definition, unstable: There is always at least one 
player who prefers to switch strategies. We will see that there always exists a Nash 
equilibrium; however, there can be many of them, and they may yield different 
payoffs to the players. Thus, Nash equilibria do not have the predictive power in 
general-sum games that safety strategies have in zero-sum games. See the notes for 
a discussion of critiques of Nash equilibria. 


EXAMPLE 4.2.2 (Cheetahs and Antelopes). Two cheetahs are chasing a pair 
of antelopes, one large and one small. Each cheetah has two possible strategies: 
Chase the large antelope (L) or chase the small antelope (S). The cheetahs will 
catch any antelope they choose, but if they choose the same one, they must share 
the spoils. Otherwise, the catch is unshared. The large antelope is worth £ and the 
small one is worth s. Here is the payoff matrix: 


2 In examples, we write one matrix whose entries are pairs (aij, bij). 
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cheetah IT 
L S 
L | (£/2,4/2) (£, s) 
S (s, £) (s/2, 8/2) 


cheetah I 


FIGURE 4.3. Cheetahs deciding whether to chase the large or the small antelope. 


If the larger antelope is worth at least twice as much as the smaller (£ > 2s), 
then strategy L dominates strategy S. Hence each cheetah should just chase the 
larger antelope. If s < < 2s, then there are two pure Nash equilibria, (L, S) and 
(S,L). These pay off quite well for both cheetahs — but how would two healthy 
cheetahs agree which should chase the smaller antelope? Therefore it makes sense 
to look for symmetric mixed equilibria. 

If the first cheetah chases the large antelope with probability x, then the ex- 
pected payoff to the second cheetah from chasing the larger antelope is 

L(x) = se +(1-<2)é, 
and the expected payoff from chasing the smaller antelope is 

S(x) =as+(1— aj 
These expected payoffs are equal when 

«, 2-8 
pange na (4.1) 

For any other value of x, the second cheetah would prefer either the pure strategy 
L or the pure strategy S, and then the first cheetah would do better by simply 
playing pure strategy S or pure strategy L. But if both cheetahs chase the large 
antelope with probability x* in|(4.1)| then neither has an incentive to deviate, so 
this is a (symmetric) Nash equilibrium. 

There is a fascinating connection between symmetric mixed Nash equilibria in 
games such as this and equilibria in biological populations. Consider a population 
of cheetahs, and suppose a fraction x of them are greedy (i.e., play strategy L). 
Each time a cheetah plays this game, he plays it against a random cheetah in the 
population. Then a greedy cheetah obtains an expected payoff of L(x), whereas a 
nongreedy cheetah obtains an expected payoff of S(x). If x > a*, then S(x) > L(x) 
and nongreedy cheetahs have an advantage over greedy cheetahs. On the other 
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behavior of greedy 
L(x) = 8 — 4x 


behavior of nongreedy 


S(x) = 3 + 3x 


x = P(cheetah 1 chases large antelope) 


FIGURE 4.4. L(x) (respectively, S(a)) is the payoff to cheetah II from chasing 
the large antelope worth £ = 8 (respectively, the small antelope worth s = 6). 


hand, if x < 2*, greedy cheetahs have an advantage. See [Figure 4.4] Altogether, 
the population seems to be pushed by evolution towards the symmetric mixed 
Nash equilibrium (z*,1 — x*). Indeed, such phenomena have been observed in real 
biological systems. The related notion of an evolutionarily stable strategy is 


formalized in 


EXAMPLE 4.2.3 (Chicken). Two drivers speed head-on toward each other and 
a collision is bound to occur unless one of them chickens out and swerves at the 
last minute. If both swerve, everything is OK (in this case, they both get a payoff 
of 1). If one chickens out and swerves, but the other does not, then it is a great 
success for the player with iron nerves (yielding a payoff of 2) and a great disgrace 
for the chicken (a penalty of 1). If both players have iron nerves, disaster strikes 
(and both incur a large penalty M). 


player II 
= Swerve (S) Drive (D) 
5 | Swerve (S) (1, 1) (—1, 2) 
@| Drive (D)| (2,-1) (-M,-M) 
A 


There are two pure Nash equilibria in this game, (S, D) and (D, S): if one 
player knows with certainty that the other will drive on (respectively, swerve), that 
player is better off swerving (respectively, driving on). 

To determine the mixed equilibria, suppose that player I plays S with probabil- 
ity x and D with probability 1— x. This presents player II with expected payoffs of 
x+(1—2)-(-1), i.e., 2x— 1 if he plays S, and 2z + (1—2)-(—M) = (M +2)x — M 
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FIGURE 4.5. The game of Chicken. 


if he plays D. We seek an equilibrium where player II has positive probability on 
each of S and D. Thus, 


1 
2x — 1 = (M + 2)xz— M; i.e. = 1 — —. 
x (M + 2)ax > ie, z i 


The resulting payoff for player II is 2x — 1 = 1 — 2/M. 


REMARKS 4.2.4. 


(1) Even though both payoff matrices decrease as M increases, the equilibrium 
payoffs increase. This contrasts with zero sum games where decreasing a 
player’s payoff matrix can only lower her expected payoff in equilibrium. 

(2) The payoff for a player is lower in the symmetric Nash equilibrium than it 
is in the pure equilibrium where that player plays D and the other plays 
S. One way for a player to ensur¢’| that the higher payoff asymmetric 
Nash equilibrium is reached is to irrevocably commit to the strategy D, 
for example, by ripping out the steering wheel and throwing it out of the 
car. In this way, it becomes impossible for him to chicken out, and if the 
other player sees this and believes her eyes, then she has no other choice 
but to chicken out. 

In a number of games, making this kind of binding commitment 
pushes the game into a pure Nash equilibrium, and the nature of that 
equilibrium strongly depends on who managed to commit first. Here, the 
payoff for the player who did not make the commitment is lower than the 
payoff in the unique mixed Nash equilibrium, while in some games it is 
higher (e.g., see the Battle of the Sexes in [§7.2). 

(3) An amusing real-life example of commitments in a different gamd] arises 
in a certain narrow two-way street in Jerusalem. Only one car at a time 


3 This assumes rationality of the other player — a dubious assumption for people who play 
Chicken. 


4 See War of Attrition in §14.4.3) 
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FIGURE 4.6. Ripping out the steering wheel is a binding commitment in Chicken. 


can pass. If two cars headed in opposite directions meet in the street, 
the driver that can signal to the opponent that he will not yield will 
convincingly force the other to back out. Some drivers carry a newspaper 
with them, which they can strategically pull out to signal that they are 
not in a rush. 


FIGURE 4.7. A driver signaling that he has all the time in the world. 


4.3. General-sum games with more than two players 


We now consider general-sum games with more than two players and generalize 
the notion of Nash equilibrium to this setting. Each player 7 has a set S; of pure 
strategies. We are given payoff or utility functions u; : S1 X S2 X --- X Sk > R, 
for each player i, where i € {1,...,k}. If player j plays strategy sj; € S; for each 
j € {1,...,k}, then player i has a payoff or utility of u;(s1,..., Sk). 


EXAMPLE 4.3.1 (One Hundred Gnus and a Lioness). A lioness is chasing 
one hundred gnus. It seems clear that if the gnus cooperated, they could chase the 
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lioness away, but typically they do not. Indeed, for all the gnus to run away is a 
Nash equilibrium, since it would be suicidal for just one of them to confront the 
lioness. On the other hand, cooperating to attack the lioness is not an equilibrium, 
since an individual gnu would be better off letting the other ninety-nine attack. 


FIGURE 4.8. One Hundred Gnus and a Lioness. 


EXAMPLE 4.3.2 (Pollution game). Three firms will either pollute a lake in 
the following year or purify it. They pay 1 unit to purify, but it is free to pollute. 
If two or more pollute, then the water in the lake is useless, and each firm must 
pay 3 units to obtain the water that they need from elsewhere. If at most one firm 
pollutes, then the water is usable, and the firms incur no further costs. 

If firm ITI purifies, the cost matrix (cost = — payoff) is 

firm IT 
purify pollute 

purify | (1,1,1) (1,0,1) 
pollute | (0,1,1) (3,3,4) 


firm I 


If firm III pollutes, then it is 


firm IT 
purify pollute 
purify | (1,1,0) (4,3,3) 
pollute | (3,4,3) (3,3,3) 


firm I 
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FIGURE 4.9. Three firms deciding whether or pollute a lake or not. 


To discuss the game, we generalize the notion of Nash equilibrium to games 
with more players. 


DEFINITION 4.3.3. For a vector s = (81,...,5,), we use s_; to denote the 
vector obtained by excluding s;; i.e., 


S—; = (s1, ...3 Si—1; Si+l;- -3 Sn): 
We interchangeably refer to the full vector (s1,..., Sn) as either s, or, slightly 


abusing notation, (si, S—;). 


DEFINITION 4.3.4. A pure Nash equilibrium in a k-player game is a sequence 
of pure strategies (s{,...,s%) € S1 x---x Sp such that for each player j € {1,..., k} 
and each s; € Sj, we have 
uz(s},s*;) 2 uj(s;,8";). 
In other words, for each player j, his selected strategy sj is a bestresponse to the 
selected strategies s* ; of the other players. 


DEFINITION 4.3.5. A (mixed) strategy profile in a k-player game is a se- 
quence (xXi,...,Xx), where x; E€ Ajg,) is a mixed strategy for player j. A mixed 
Nash equilibrium is a strategy profile (xj,...,x;) such that for each player 
j € {1,...,k} and each probability vector x; € Ajs,), we have 

uj (xj, x25) > uj (x;,x",). 
Here 
uj (X1, X2,---,Xk) = 5 X1 (31) ++ Xk(Sk)Uj (S14 --, Sk), 
81E€S1,...,8hESk 


where x;(s) is the probability that player 7 assigns to pure strategy s in the mixed 
strategy Xi. 


We will prove the following result in 
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THEOREM 4.3.6 (Nash’s Theorem). Every finite general-sum game has a 
Nash equilibrium. 


For determining Nash equilibria in (small) games, the following lemma (which 
we have already applied several times) is useful. 


LEMMA 4.3.7. Consider a k-player game where x; is the mixed strategy of player 
i. For each i, let T; = {s € S; | x;(s) > O}. Then (x1,...,xx) is a Nash equilibrium 
if and only if for each i, there is a constant ci such that? 


Vs; ET; uj(s;,x)=cq and Vsi gT; ilsi, X-i) < ci. 
EXERCISE 4.a. Prove Lemma [4.3.7 


REMARK 4.3.8. [Lemma 4.3.7| extends the equivalence (i) © (ii) of 


Returning to the Pollution game, it is easy to check that there are two pure 
equilibria. The first consists of all three firms polluting, resulting in a cost of 3 to 
each player, and the second consists of two firms purifying (at cost 1 each) and one 
firm polluting (at no cost). The symmetric polluting equilibrium is an example of 
the Tragedy of the Commons} All three firms would prefer any of the asymmetric 
equilibria, but cannot unilaterally transition to these equilibria. 

Next we consider mixed strategies. First, observe that if player III purifies, 
then it is a best response for each of player I and II to purify with probability 2/3 
and pollute with probability 1/3. Conversely, it is a best response for player III to 
purify, since his cost is 1-8/9+4-1/9 = 12/9 for purifying, but 0-4/9+3-5/9 = 15/9 
for polluting. Similarly, there are Nash equilibria where player I (resp. player IT) 
purifies and the other two mix in the proportions (2/3, 1/3). 

Finally, we turn to fully mixed strategies. Suppose that player 2’s strategy is 
x; = (p;,1 — pi) (that is, i purifies with probability p;). It follows from Lemma 
[4.3.7| that these strategies are a Nash equilibrium with 0 < p; < 1, if and only if 


u; (purify, x_;) = u;(pollute, x_;). 
Thus, if 0 < pı < 1, then 


p2p3 + p2(1 — p3)+p3(1 — p2) + 4(1 — p2)(1 — ps) 
= 3p2(1 — p3) + 3p3(1 — p2) + 3(1 — p2)(1 — ps), 


or, equivalently, 


1 = 3(p2 + p3 — 2p2p3). (4.2) 
Similarly, if 0 < po < 1, then 

1 = 3(p1 + ps — 2p1ps), (4.3) 
and if 0 < ps < 1, then 

1 = 3(p1 + p2 — 2pip2). (4.4) 


Subtracting (4.3) from (4.4), we get 0 = 3(p2 — p3)(1 — 2p;). This means that 
if all three firms use mixed strategies, then either pọ = p3 or pı = 1/2. In the 
first case (p2 = p3), equation (4.2) becomes quadratic in p2, with two solutions 


5 The notation (s;,x—;) is an abbreviation where we identify the pure strategy s; with the 
probability vector 1s; that assigns s; probability 1. 


6 In games of this type, individuals acting in their own self-interest deplete a shared resource, 
(in this case, clean water) thereby making everybody worse off. See also|Example 4.5.1 
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p2 = ps = (3 + V3)/6, both in (0,1). Substituting these solutions into the other 
equations yields pı = po = p3, resulting in two symmetric mixed equilibria. If, 
instead of po = p3, we let pı = 1/2, then|(4.3)| becomes 1 = 3/2, which is nonsense. 
This means that there is no asymmetric equilibrium with three mixed (non-pure) 
strategies. One can also check that there is no equilibrium with two pure and 
one non-pure strategy. Thus the set of Nash equilibria consists of one symmetric 
and three asymmetric pure equilibria, three equilibria where one player has a pure 
strategy and the other two play the same mixed strategy, and two symmetric mixed 
equilibria. 

The most reasonable interpretation of the symmetric mixed strategies involves 


population averages, as in|Example 4.2.2} If p denotes the fraction_of firms that 
purify, then the only stable values of p are (3 + V3)/6, as shown in|Exercise 4.15 


4.3.1. Symmetric games. With the exception of |Example 4.1.4} all of the 


games in this chapter so far are symmetric. Indeed, they are unchanged by any 
relabeling of the players. Here is a game with more restricted symmetry. 


EXAMPLE 4.3.9 (Location-sensitive Pollution). Four firms are located around 
a lake. Each one chooses to pollute or purify. It costs 1 unit to purify, but it is 
free to pollute. But if a firm 7 and its two neighbors i + 1(mod 4) pollute, then the 
water is unusable to 7 and u; = —3. 

This game is symmetric under rotation. In particular, ui(s1, 82,83,$4) = 
ua(S4, $1, $2, $3). Consequently, ui(s,x,x,x) = U2(x, s, X, X) for all pure strategies 
s and mixed strategies x. 


This motivates the following general definition. 


DEFINITION 4.3.10. Suppose that all players in a k-player game have the same 
set of pure strategies S. Denote by u;(s;x) the utility of player j when he plays 
pure strategy s € S and all other players play the mixed strategy x. We say the 
game is symmetric if 

uj(s;x) = uj (s; X) 
for every pair of players i, j, pure strategy s, and mixed strategy x. 
We will prove the following proposition in 


PROPOSITION 4.3.11. In a symmetric game, there is a symmetric Nash equi- 
librium (where all players use the same strategy). 


4.4. Potential games 


In this section, we consider a class of games that always have a pure Nash 
equilibrium. Moreover, we shall see that a Nash equilibrium in such a game can be 
reached by a series of best-response moves. We begin with an example. 


EXAMPLE 4.4.1 (Congestion Game). There is a road network with R roads 
and k drivers, where the j* driver wishes to drive from point sj to point tj. Each 
driver, say the j*", chooses a path P; from s; to t; and incurs a cost or latency due 
to the congestion on the path selected. 

This cost is determined as follows. Suppose that the paths selected by the k 
drivers are P = (P,, Po,...,P,). For each road r, let n,(P) be the number of 
drivers j that use r; i.e., r is on Pj. Denote by c,(n) the cost incurred by a driver 
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using road r when n drivers use that road. The total cost incurred by driver i 
taking path P; is the sum of the costs on the roads he uses; i.e., 


cost;(P) = $. c,(n,(P)). 
rEP; 


Imagine adding the players one at a time and looking at the cost each player 
incurs at the moment he is added. We claim that the sum of these quantitied] is 


R 
o(P):=S> Y eA. (4.5) 


Indeed, if player 7 is the last one to be added, the claim follows by induction from 
the identity 


(P) = (PLi) + cost;(P) (4.6) 


Moreover, ¢(P) does not depend on the order in which the players are added, and 
(4.6)| holds for all i. Thus, any player can be viewed as the last player. 


COROLLARY 4.4.2. Let ọ be defined by (4.5). Fix a strategy profile P = 
(Pi,..., Pk). If player i switches from path P; to an alternative path P!, then 
the change in the value of ¢ equals the change in the cost he incurs: 


¢(P!,P_;) — (P) = cost;(P/, P_;) — cost;(P). (4.7) 
(See [Figure FT) 
We call ¢ a potential function for this congestion game. |Corollary 4.4.2|implies 


that the game has a pure Nash equilibrium: If P minimizes ¢, then it is a Nash 
equilibrium, since cost;(P/, P_;) — cost;(P) > 0. 


ce(1) + (2) 
+ c(3) 


@ = c(1) + e(1) + c(1) + e(2) + c(3) ¢@ = c(1) + e(1) + e(1) + e(2) + c(2) + e(2) 


FIGURE 4.10. In this example, the cost is the same on all three roads, and is 
c(n) if the number of drivers is n. The figure shows the potential ¢ before and 
after the player going from c to b switches from the direct path to the indirect 
path through a. The change in potential is the change in cost experienced by 
this player. 


T Note that (P) is not the total cost incurred by the players. 
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4.4.1. The general notion. The congestion game we just saw is an example 
of a potential game. More generally, consider a k-player game, in which player j’s 
strategy space is the finite set S}. Let w;(s1, $2,..., Sk) denote the payoff to player 
i when player j plays strategy s; for each j € {1,...,k}. In a potential game, there 
is a function wy: S1 x--- x Sk — R such that for each i and s_; E€ S_; 
Y(si, si) — ui(si, s_i) is independent of s;. (4.8) 
Equivalently, for each i, si, S; € S; and s_; E€ S_; 
Y (ši, s_i) = Y(si, si) = Uj (ši, s_i) — Ui (si, s_i) x (4.9) 
We call the function ~ a potential function associated with the game. 


CLAIM 4.4.3. Every potential game has a Nash equilibrium in pure strategies. 


Proor. The set S1 X --- x Sk is finite, so there exists some s that maximizes 
p(s). Note that for this s, the expression on the right-hand side in[(4.9)]is at most 
zero for any i € {1,...,k} and any choice of s;. This implies that s is a Nash 
equilibrium. 


REMARK 4.4.4. Clearly, it suffices that s is a local maximum; i.e., for all i and 


p(s) > (s;,8-:)- 


In|Example 4.4.1| the game was more naturally described in terms of agents 


trying to minimize their costs; i.e. 
ui(s) = —cost;(s). 


In potential games with costs, as we saw above, it is more convenient to have 
the potential function decrease as the cost of each agent decreases. Thus, ¢ = 
—wy is a potential for a game with given cost functions if 7 is a potential for the 
corresponding utilities w. 


REMARK 4.4.5. If a function wy is defined for strategy profiles of k — 1 as well 
as k players and satisfies 


p(s) = U(s_;) + uls) Vs Vi, (4.10) 


then 7 is a potential function; i.e., holds. Note that this held in congestion 
games. In fact, every potential function has such an extension. See [Exercise 4.19} 
This suggests a recipe for constructing potential functions in games which are well- 
defined for any number of players: Let players join the game one at a time and add 
the utility of each player when he joins. 


4.4.1.1. Repeated play dynamics. Consider a set of k players repeatedly playing 
a game starting with some initial strategy profile. In each step, exactly one player 
changes strategy and (strictly) improves his payoff. When no such improvement 
is possible, the process stops and the strategy profile reached must be a Nash 
equilibrium. In general, such a process might continue indefinitely, e.g., Rock, 
Paper, Scissors. 


PROPOSITION 4.4.6. In a potential game, the repeated play dynamics above 
terminate in a Nash equilibrium. 
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PROOF. Equation [(4.9)| implies that in each improving move, the utility of the 
player that switches actions and the potential ~ increase by the same amount. 
Since there are finitely many strategy profiles s, and in each improving move %(s) 
increases, at some point a (local) maximum is reached and no player can increase 
his utility by switching strategies. In other words, a Nash equilibrium is reached. 


In the important special case where each player chooses his best improving 
move, the repeated play process above is called best-response dynamics. 


4.4.2. Additional examples. 


EXAMPLE 4.4.7 (Consensus). Consider a finite undirected graph G = (V, E). 
In this game, each vertex {1,...,n} € V is a player, and her action consists of 
choosing a bit in {0,1}. We represent vertex it’s choice by b; € {0,1}. Let N (i) be 
the set of neighbors of i in G and write b = (b1,..., bn). The loss D;(b) for player 
i is the number of neighbors that she disagrees with; i.e., 


Di(b) = >> |b: — b; 
JEN (i) 
For example, the graph could represent a social network for a set of people, each 
deciding whether to go to Roxy or Hush (two nightclubs); each person wants to go 
to the club where most of his or her friends will be partying. 

We define ¢(b) = 4 $; Di(b) and observe that this counts precisely the number 
of edges on which there is a disagreement. This implies that ¢(-) is a potential func- 
tion for this game. Indeed, if we define ¢(b_;) to be the number of disagreements 
on edges excluding i, then ¢(b) = ¢(b_;) + D;(b). Therefore, a series of improving 
moves, where exactly one player moves in each round, terminates in a pure Nash 
equilibrium. Two of the Nash equilibria are when all players are in agreement, but 
in some graphs there are other equilibria. 


0 1 


FIGURE 4.11. This is a Nash equilibrium in the Consensus game. 


Now consider what happens when all players that would improve their payoff by 
switching their bit, do so simultaneously. In this case, the process might continue 
indefinitely. However, it will converge to a cycle of period at most two; i.e., it 
either stabilizes or alternates between two bit vectors. To see this, suppose that 
the strategy profile at time t is bt = (bf, b5,...,b%). Let 


f= ` 5 lb; — BF? |. 
i jeN(i) 
Observe that 


D2 ltt — bf] < SP a 


JEN (i) jEN(i) 
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since pet is chosen at time t+ 1 to minimize the left hand side. Moreover, equality 
holds only if btt} = bt. Summing over i shows that fi41 < fe. Thus, when f 
reaches its minimum, we must have i z pit for all i. 


REMARK 4.4.8. To prove that a game has a pure Nash equilibrium that is 
reached by a finite sequence of improving moves, it suffices to find a generalized 
potential function, i.e., a function Ww: S1 x --- x Sk 4 R such that for each i and 
s_;€ Si 

sgn (v(5:,8_i) — Y (si, s-1)) = sgn (ui (Si, 8_a) — us(si, 8_a)) (4.11) 
for all si, 5; € Sı, where sgn(x) = a/|a| if x # 0 and sgn(0) = 0. 

EXAMPLE 4.4.9 (Graph Coloring). Consider an arbitrary undirected graph 
G = (V,E) on n vertices. In this game, each vertex v; € V is a player, and its 
possible actions consist of choosing a color s; from the set [n] := {1,...,n}. For 
any color c, define 

ne(s) = number of vertices with color c when players color according to s. 
The payoff of a vertex v; (with color s;) is equal to the number of other vertices 


with the same color if v;’s color is different from that of its neighbors, and it is 0 
otherwise; i.e., 


(s) ns,(s) if no neighbor of vj has the same color as vj, 
Us = 
0 otherwise. 


For example, the graph could represent a social network, where each girl wants to 
wear the most popular dress (color) that none of her friends have. 


FIGURE 4.12. A social network. 


Consider a series of moves in which one player at a time makes a best-response 
move. Then as soon as every player who has an improving move to make has done 
so, the graph will be properly colored; that is, no neighbors will have the same 
color. This is because a node’s payoff is positive if it doesn’t share its color with 
any neighbor and it is nonpositive otherwise. Moreover, once the graph is properly 
colored, it will never become improperly colored by a best-response move. Thus, 
we can restrict our attention to strategy profiles s in which the graph is properly 
colored. 
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LEMMA 4.4.10. Graph Coloring has a pure Nash equilibrium. 


PRooF. We claim that, restricted to proper colorings, the function 


n nels) 


v(s)=>> >: 


c=1 =1 
is a potential function: For any proper coloring s and any player i, 
V(s) = ¥(s—a) + ns (s) = Y(s-:) + uals); 
ice.,|(4.10)| holds for proper colorings. 


COROLLARY 4.4.11. Let x(G) be the chromatic number of the graph G, that 
is, the minimum number of colors in any proper coloring of G. Then the graph 
coloring game has a pure Nash equilibrium with x(G) colors. 


PROOF. Suppose that s is a proper coloring with x(G) colors. Then in a series 
of single-player improving moves starting at s, no player will ever introduce an 
additional color, and the coloring will remain proper always. In addition, since the 
game is a potential game, the series of moves will end in a pure Nash equilibrium. 
Thus, this Nash equilibrium will have x(G) colors. 


4.5. Games with infinite strategy spaces 
In some cases, a player’s strategy space S; is infinite. 


EXAMPLE 4.5.1 (Tragedy of the Commons). Consider a set of k players 
that each want to send data along a shared channel of maximum capacity 1. Each 
player decides how much data to send along the channel, measured as a fraction 
of the capacity. Ideally, a player would like to send as much data as possible. The 
problem is that the quality of the channel degrades as a larger fraction of it is 
utilized, and if it is over-utilized, no data gets through. In this setting, each agent’s 
strategy space S; is [0,1]. The utility function of each player i is 


ui(S;,S—i) := si(1 — 5 sj) 
J 
if $2; sj < 1, and it is 0 otherwise. 
We check that there is a pure Nash equilibrium in this game. Fix a player 7 and 
suppose that the other players select strategies s_;. Then player 2’s best response 
consists of choosing s; € [0,1] to maximize s;(1 — 7, sj), which results in 


s= (1-5 s;)/2 (4.12) 
JFi 
To be in Nash equilibrium, must hold for all 7. The unique solution to this 
system of equations has s; = 1/(k + 1) for all i. 
This is a “tragedy” because each player’s resulting utility is 1/(k+1)?, whereas 
if s; = 1/2k for all 7, then each player would have utility 1/44. However, the latter 
choice is not an equilibrium. 
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EXAMPLE 4.5.2 (Nightclub Pricing). Three neighboring colleges have n stu- 
dents each that hit the nightclubs on weekends. Each of the two clubs, Roxy and 
Hush, chooses a price (cover charge) in [0,1]. College A students go to Roxy, Col- 
lege C students go to Hush, and College B students choose whichever of Roxy or 
Hush has the lower cover charge that weekend, breaking ties in favor of Roxy. (See 
Figure 4.13}) 


College A pı College B P2 College C 


(breaks ties in 
favor of Roxy) 


FIGURE 4.13. The Nightclub Pricing game. 


Thus, if Roxy sets the price at pı and Hush sets the price at po, with pı < po, 
then Roxy’s utility is 2npı (n students from each of college A and college B pay 
the price pı) and Hush’s utility is np2, whereas if pı > p2, then Roxy’s utility is 
np, and Hush’s utility is 2npo. 

In this game, there is no pure Nash equilibrium. To see this, suppose that Hush 
chooses a price p > 1/2. Then Roxy’s best response is to choose pı = pg. But 
then pə is no longer a best response to pı. If pp = 1/2, then Roxy’s best response 
is either pı = 1/2 or pı = 1, but in either case po = 1/2 is not a best-response. 
Finally, Hush will never set po < 1/2 since this yields a payoff less than n, whereas 
a payoff of n is always achievable. 

There is, however, a symmetric mixed Nash equilibrium. Any pure strategy 
with pı < 1/2 is dominated by the strategy pı = 1, and thus we can restrict 
attention to mixed strategies supported on [1/2,1]. Suppose that Roxy draws its 
price from distribution F and Hush from distribution G, where both distributions 
are continuous and supported on all of [1/2,1]. Then the expected payoff to Hush 
for any price p it might choose is 


n(pF(p) +2p(1 — F(p))) = np(2 — F(p)), (4.13) 


which must] equal cn for some constant c and all p in [1/2, 1]. Setting p = 1 shows 
that c= 1, so F(x) = 2 — 1/x in [1/2, 1] (corresponding to density f(x) = 1/x? on 
that interval). Setting G = F yields a Nash equilibrium. Note that the continuous 
distributions ensure that the chance of a tie is zero. 


EXERCISE 4.b. Consider two nightclubs with cover charges pı > 0 and p2 > 0 
respectively. Students at the nearby college always go to the cheaper club, breaking 
ties for club 1. The revenues per student will be (p1,0) if pı < po and (0, pe) if 


8 We are using a continuous version of|Lemma 4.3.7| If p(2 — F(p)) > p(2 — F(p)) for some 
p, B € [1/2, 1], then the same inequality would hold if p, p are slightly perturbed (by the continuity 
of F), and Hush would benefit by moving mass from a neighborhood of p to a neighborhood of p. 
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pı > p2. Show that for any c > 0, there is a mixed Nash equilibrium that yields an 
expected revenue of c to each club. 


we ae sie 
OAs 
ont 


FIGURE 4.14. The seller, who knows the type of the car, may misrepresent it 
to the buyer, who doesn’t know the type. (Drawing courtesy of Ranjit Samra; 


see http: //rojaysoriginalart.com.) 


4.6. The market for lemons 


Economist George Akerlof won the Nobel Prize for analyzing how a used car 
market can break down in the presence of asymmetric information. Here is an 
extremely simplified version of his model. Suppose that there are cars of only two 
types: good cars (G) and lemons (L). A good car is worth $9,000 to all sellers and 
$12,000 to all buyers, while a lemon is worth only $4,000 to sellers and $6,000 to 
buyers. Obviously a seller knows what kind of car he is selling. If a buyer knew the 
type of the car being offered, he could split the difference in values with the seller 
and gain $1,500 for a good car and $1,000 for a lemon. However, a buyer doesn’t 
know what kind of car he bought: lemons and good cars are indistinguishable at 
first, and a buyer only discovers what kind of car he bought after a few weeks, when 
the lemons break down. What a buyer does know is the fraction p of cars on the 
market that are lemons. Thus, the maximum amount that a rational buyer will 
pay for a car is 6,000p + 12,000(1 — p) = f(p), and a seller who advertises a car at 
f(p) — £ will sell it. 

However, if p > 4, then f(p) < $9,000, and sellers with good cars won’t sell 
them. Thus, p will increase, f(p) will decrease, and soon only lemons will be left 
on the market. In this case, asymmetric information hurts sellers with good cars, 
as well as buyers. 
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Notes 


The Prisoner’s Dilemma game was invented in 1950 by Flood and Dresher 
Poull}. The game is most relevant when it is repeated. See Although Prisoner’s 
Dilemma is most famous, Skyrms makes the case that Stag Hunt is more repre- 
sentative of real-life interactions. Stag Hunt and Chicken were both used as models for 
nuclear deterrence, e.g., by Schelling, who won a Nobel Prize in 2005 for “having enhanced 
our understanding of conflict and cooperation through game-theory analysis.” See 
for a survey of game theory models of peace and war. Cheetahs and Antelopes and the 
Pollution Game are taken from [Gin00]. One Hundred Gnus and a Lioness is from [Sha17]. 
Finding Nash equilibria in multiplayer games often involves solving systems of polynomial 
equations as, for example, in the Pollution Game. An excellent survey of this topic is 
in [Stu02|. The Driver and Parking Inspector Game is from [Tv802]. 

Congestion games and best-response dynamics therein were introduced and analyzed 
by Rosenthal [Ros73]. Monderer and Shapley studied the more general concept 


of potential games. The Consensus game (Example 4.4.7) is from |GO80|, and the Graph 
Coloring game (Example 4.4.9) is from |PS12b). Although the best-response dynamics 


in potential games is guaranteed to reach an equilibrium, this process might not be the 
most efficient method. Fabrikant et al. showed that there are congestion games in 
which best responses can be computed efficiently (in time polynomial in N, the sum of the 
number of players and the number of resources), but the minimum number of improving 
moves needed to reach a pure Nash equilibrium is exponential in N. 

Tragedy of the Commons was introduced in a classic paper by Hardin [Har68al, 
who attributes the idea to Lloyd. [Example 4.5.1] and are from Chapter 
1 of [Nis0d. 

The market for lemons is due to Akerlof [Ake70]. George Akerlof won the 2001 
Nobel Prize in Economics, together with A. Michael Spence and Joseph Stiglitz, “for their 
analyses of markets with asymmetric information.” 


John Nash George Akerlof 


The notion of Nash equilibrium, and Nash’s Theorem showing that every finite game 
has a Nash equilibrium, are from [Nas50a]. 

While the Nash equilibrium concept is a cornerstone of game theory, the practical 
relevance of Nash equilibria is sometimes criticized for the following reasons. First, the 
ability of a player to play their Nash equilibrium strategy depends on knowledge and, 
indeed, common knowledge of the game payoffs. Second, Nash equilibria are viewed as 
representing a selfish point of view: A player would switch actions if it improves his 
utility, even if the damage to other players’ utilities is much greater. Third, in situations 
where there are several Nash equilibria, it is unclear how players would agree on one. 
Finally, it may be difficult to find a Nash equilibrium, even in small games, when players 
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have bounded rationality (see, e.g., [Rub98]). In large games, a series of recent results in 
computational complexity show that the problem of finding Nash equilibria is likely to be 
intractable (see, e.g., [Das13]). To quote Kamal Jain (personal 
communication), “If your laptop can’t find an equilibrium, neither can the market.” 

A number of refinements have been proposed to address multiplicity of Nash equilbria. 
These include evolutionary stability (discussed in[§7.1), focal points [Tho60], and trembling 
hand perfect equilibria [Sel75|. The latter are limits of equilibria in perturbed games, 
where each player must assign positive probability to all possible actions. In a trembling 
hand perfect equilibrium, every (weakly) dominated strategy is played with probability 0. 
See Section 7.3 in [MSZ13]. 

Regarding mixed Nash equilibria, critics sometimes doubt that players will explicitly 
randomize. There are several responses to this. First, in some contexts, e.g., the Penalty 
Kicks example from the Preface, randomness represents the uncertainty one player has 
about the selection process of the other. Second, sometimes players do explicitly random- 
ize; for example, randomness is used to decide which entrances to an airport are patrolled, 
which passengers receive extra screening, and which days discounted airline tickets will 
be sold. Finally, probabilities in a mixed Nash equilibrium may represent population 


proportions as in 


Exercises 


4.1. Modify the game of Chicken as follows. There is p € (0,1) such that, when 
a player swerves (plays S'), the move is changed to drive (D) with probabil- 
ity p. Write the matrix for the modified game, and show that, in this case, 
the effect of increasing the value of M changes from the original version. 


4.2. Two smart students form a study group in some math class where home- 
work is handed in jointly by each group. In the last homework of the se- 
mester, each of the two students can choose to either work (“W”) or party 
(“P”). If at least one of them solves the homework that week (chooses 
“W” ), then they will both receive 10 points. But solving the homework in- 
curs a substantial effort, worth —7 points for a student doing it alone, and 
an effort worth —2 points for each student, if both students work together. 
Partying involves no effort, and if both students party, they both receive 
0 points. Assume that the students do not communicate prior to deciding 
whether they will work or party. Write this situation as a matrix game and 
determine all Nash equilibria. 


4.3. Consider the following game: 


player IT 

C D 
A | (6,—10) (0,10) 
B| (4,1) (1,0) 


player I 


e Show that this game has a unique mixed Nash equilibrium. 

e Show that if player I can commit to playing strategy A with probability 
slightly more than «* (the probability she plays A in the mixed Nash 
equilibrium), then (a) player I can increase her payoff and (b) player 
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II also benefits, obtaining a greater payoff than he did in the Nash 
equilibrium. 

e Show similarly that if player II can commit to playing strategy C with 
probability slightly less than y* (the probability he plays C in the 
mixed Nash equilibrium), then (a) player II can increase his payoff 
and (b) player I also benefits, obtaining a greater payoff than she did 
in the Nash equilibrium. 


4.4. Show that if the Prisoner’s Dilemma game is played k times, where a 
player’s payoff is the sum of his payoffs in the k rounds, then it is a domi- 
nant strategy to confess in every round. Hint: backwards induction. 


4.5. Two cheetahs and three antelopes: Two cheetahs each chase one of three 
antelopes. If they catch the same one, they have to share. The antelopes 
are Large, Small, and Tiny, and their values to the cheetahs are @, s and 
t. Write the 3 x 3 matrix for this game. Assume that t < s < £ < 2s and 


SH Lya Qs — 0 
= p 
e a 


Find the pure equilibria and the symmetric mixed equilibria. 


S 4.6. Consider the game below and show that there is no pure Nash equilibrium, 
only a unique mixed one. Also, show that both commitment strategy pairs 
have the property that the player who did not make the commitment still 
gets the Nash equilibrium payoff. 


player II 
= C D 
s | A | (6,—10) (0,10) 
TIB] (41) (0) 
A 
4.7. Volunteering dilemma: There are n players in a game show. Each 


player is put in a separate room. If some of the players volunteer to help 
the others, then each volunteer will receive $1000 and each of the remaining 
players will receive $1500. If no player volunteers, then they all get zero. 
Show that this game has a unique symmetric (mixed) Nash equilibrium. 
Let p, denote the probability that player 1 volunteers in this equilibrium. 
Find pə and show that limp... Npn = log(3). 


4.8. Three firms (players I, II, and III) put three items on the market and adver- 
tise them either on morning or evening TV. A firm advertises exactly once 
per day. If more than one firm advertises at the same time, their profits are 
zero. If exactly one firm advertises in the morning, its profit is $200K. If 
exactly one firm advertises in the evening, its profit is $300K. Firms must 
make their advertising decisions simultaneously. Find a symmetric mixed 
Nash equilibrium. 
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4.9. Consider any two-player game of the following type: 


player IT 
= A B 
5 | A|(a,a) (b,c) 
|B |(cb) (d,d) 
A 


e Compute optimal safety strategies and show that they are not a Nash 
equilibrium. 

e Compute the mixed Nash equilibrium and show that it results in the 
same player payoffs as the optimal safety strategies. 


4.10. Consider |Example 4.1.3} Assume that with some probability p each coun- 
try will be overtaken by extremists and will attack. Write down the game 
matrix and find Nash equilibria. 


4.11. The Welfare Game: John has no job and might try to get one. Or he 
might prefer to take it easy. The government would like to aid John if he 
is looking for a job but not if he stays idle. The payoffs are 

jobless John 

try not try 

aid | (3,2) (—1,3) 

no aid | (—1,1) (0,0) 


government 


Find the Nash equilibria. 


* 4.12. Use Lemma to derive an exponential time algorithm for finding a Nash 
equilibrium in two-player general-sum games using linear programming. 


4.13. The game of Hawks and Doves: Find the Nash equilibria in the game of 
Hawks and Doves whose payoffs are given by the matrix: 


player IT 
= D H 
a| DI(1,1) (0,3) 
@ | H | (3,0) (—4,—4) 
a 


4.14. Consider the following n-person game. Each person writes down an integer 
in the range 1 to 100. A reward is given to the person whose number is 
closest to the mean. (In the case of ties, a winner is selected at random 
from among those whose number is closest to the mean.) What is a Nash 
equilibrium in this game? 


4.15. Suppose that firm III shares a lake with two randomly selected firms from 
a population of firms of which proportion p purify. Show that firm III is 
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better off purifying if |p — 1/2| < V3/6, whereas firm III is better off pol- 
luting if |p — 1/2| > V3/6. 


4.16. | Consider a k player game, where each player has the same set S of pure 
strategies. A permutation m of the set of players {1,...,4} is an auto- 
morphism of the game if for every i, and every strategy profile s, we 
have 


ui(S1, TET Sk) = Un(i)(Sx-1(1); seey Sq—1(k)): 
Show that if all players have the same set of pure strategies, and for any two 
players to, jo € {1,...,k}, there is an automorphism z such that 7(i9) = jo, 


then the game is symmetric in the sense of|Definition 4.3.10 


4.17. A simultaneous congestion game: There are two drivers, one who will travel 
from A to C, the other from B to D. Each road is labelled (x,y), where 
x is the cost to any driver who travels the road alone, and y is the cost to 
each driver if both drivers use this road. Write the game in matrix form, 
and find all of the pure Nash equilibria. 


[A (1.5) D 
(3,6) (2,4) 
B = C! 


4.18. Consider the following market sharing game discussed in [88.3] There are 
k NBA teams, and each of them must decide in which city to locate. Let 
vj be the profit potential, i.e., number of basketball fans, of city j. If £ 
teams select city j, they each obtain a utility of v;/é. Let c = (c1,...,cx) 
denote a strategy profile where c; is the city selected by team i, and let 
Ne, (C) be the number of teams that select city c; in this profile. Show that 
the market sharing game is a potential game with potential function 


Ne, (c) 


g= 7 


jeC b=l 


and hence has a pure Nash equilibrium. 


4.19. Show that if ~ is a potential function for a game of k players, then w can 
be extended to strategy profiles of k — 1 players to satisfy |(4.10) 


4.20. Consider the following variant of the Consensus game (Example 4.4.7). 


Again, consider an arbitrary undirected graph G = (V, E). In this game, 
each vertex {1,...,n} € V is a player, and her action consists of choosing a 
bit in {0,1}. We represent vertex it’s choice by b; € {0,1}. Let N(z) be the 
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set of neighbors of i in G and write b = (b),...,b,). The difference now 
is that there is a weight w,; on each edge (i, j) that measures how much 
the two players i and j care about agreeing with each other. (Assume that 
Wij = wji.) In this case, the loss D;(b) for player i is the total weight of 
neighbors that she disagrees with; i.e., 


Dj(b) = X` [bj — by lug. 
jEN(i) 
Show that this is a potential game and that simultaneous improving moves 
converge to a cycle of period at most 2. 


4.21. | Consider the setting of the previous exercise and show that if the weights 
wij and wj; are different, then the game is not a potential game. 


4.22. | Construct an example showing that the Graph Coloring game of Example 
has a Nash equilibrium with more than y(G) colors. 


4.23. The definition of a potential game extends to infinite strategy spaces S;: 
Call y : Į]; S: —> R a potential function if for all i, the function s; > 


w(s;,8_i) — uilsi,S—;) is constant on Si. Show that |Example 4.5.1] is a 


potential game. Hint: Consider the case of two players with strategies 
x,y € [0,1]. It must be that 


P(x,y) = cy +21 = £ — y) = cs +y(1 -= x — y); 
i.e., Cy + 2(1 — z) = cs + y(1 — y). 
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CHAPTER 5 


Existence of Nash equilibria and fixed points 


5.1. The proof of Nash’s Theorem 
Recall Nash’s Theorem: 


THEOREM 5.1.1. For any general-sum finite game with k > 2 players, there 
exists at least one Nash equilibrium. 


We will use the following theorem that is proved in the next section. 


THEOREM 5.1.2 (Brouwer’s Fixed-Point Theorem). Suppose that K C R? 
is closed, convex, and bounded. If T : K —> K is continuous, then there exists 
x E€ K such that T(x) =x. 


PROOF OF NASH’S THEOREM VIA BROUWER’S THEOREM. First, consider the 
case of two players. Suppose that the game is specified by payoff matrices Am yn 
and Bmxn for players I and II. Let K = Am x An. We will define a continuous 
map T : K — K that takes a pair of strategies (x, y) to a new pair (x, y) with the 
following properties: 


(i) x is a better response to y than x is, if such a response exists; otherwise 


=x. 
(ii) y is a better response to x than y is, if such a response exists; otherwise 
Y=: 


A fixed point of T will then be a Nash equilibrium. 

Fix the strategy y of player II. Define c; to be the maximum of zero and the 
gain player I obtains by switching from strategy x to pure strategy i. Formally, for 
X E€ Am 

Ci = (x,y) := max { A;y = x? Ay: 0}, 

where A; denotes the i** row of the matrix A. Define x € Am by 

g = Zi + Ci 

AS 1+ ia] Ck 
i.e., the weight of each action for player I is increased according to its performance 
against the mixed strategy y. 
Similarly, let 
dj := dj(x, y) := max {x7 BÍ — x! By, 0}, 


where B/ denotes the j*" column of B, and define y € An by 


Finally, let T(x, y) = (à, y). 


99 
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We claim that property (i) holds for this mapping. If c; = 0 (i.e., x’ Ay > Ajy) 


for all i, then x = x is a best response to y. Otherwise, S := )7;", ci > 0. We need 
to show that 


SB: Aiy > x7 Ay. (5.1) 
i=1 


Multiplying both sides by 1+ S, this is equivalent to 
Se + ci) Aiy > (1 + S)x? Ay, 
i=1 


which holds since 
5 ciAiy > Sx? Ay = 5 cx! Ay. 
i=1 i 


Similarly, property (ii) is satisfied. 

Finally, we observe that K is convex, closed, and bounded and that T is con- 
tinuous since c; and dj are. Thus, an application of Brouwer’s theorem shows that 
there exists (x,y) € K for which T(x, y) = (x,y); by properties (i) and (ii), (x, y) 
is a Nash equilibrium. 

For k > 2 players, we define for each player 7 and pure strategy ¢ of that 
player the quantity cl ) which is the gain player j gets by switching from his current 
strategy x4 to pure strategy £, if positive, given the current strategies of all the 
other players. The rest of the argument follows as before. 


Proposition 4.3.11|claimed that in a symmetric game, there is always a sym- 


metric Nash equilibrium. 


PROOF OF [PROPOSITION 4.3.11} The map T, defined in the preceding proof 
from the k-fold product A, x --- x A, to itself, can be restricted to the diagonal 


D = {(x,..., x) E AE : XE An}. 
The image of D under T is a subset of D, because, in a symmetric game, 
ch (x,...,x) == of) (x,...,x) 


for all pure strategies i and x € An. Brouwer’s Fixed-Point Theorem yields a fixed 
point within D, which is a symmetric Nash equilibrium. 


5.2. Fixed-point theorems* 


Brouwer’s Theorem is straightforward in dimension 1. Given T : [a,b] > [a,b], 
define f(x) := T(x) — x. Clearly, f(a) > 0, while f(b) < 0. By the Intermediate 
Value Theorem, there is x € [a,b] for which f(x) = 0, so T(x) = a. 

In higher dimensions, Brouwer’s Theorem is rather subtle; in particular, there 
is no generally applicable recipe to find or approximate a fixed point, and there 
may be many fixed points. Thus, before we turn to a proof of we 
discuss some easier fixed point theorems, where iteration of the mapping from any 
starting point converges to the fixed point. 
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FIGURE 5.1. Under the transformation T a square is mapped to a smaller 
square, rotated with respect to the original. When iterated repeatedly, the 
map produces a sequence of nested squares. If we were to continue this process 
indefinitely, a single point (fixed by T) would emerge. 


5.2.1. Easier fixed-point theorems. Banach’s Fixed-Point Theorem ap- 
plies when the mapping T contracts distances, as in [Figure 5.1] 

Recall that a metric space is complete if each Cauchy sequence therein con- 
verges to a point in the space. For example, any closed subset of R” endowed with 
the Euclidean metric is complete. 


THEOREM 5.2.1 (Banach’s Fixed-Point Theorem). Let K be a complete 
metric space. Suppose that T : K —> K satisfies d(Tx,Ty) < Ad(x,y) for all 
x,y E€ K, withO<A<1 fixed. Then T has a unique fixed point z € K. Moreover, 
for any x € K, we have 
d(x, Tx)A” 

1-r» ` 
PROOF. Uniqueness of fixed points: If Tx = x and Ty = y, then 
d(x,y) = d(Tx, Ty) < Ad(x, y). 
Thus, d(x,y) =0,sox=y. 

As for existence, given any x € K, we define x, = Tx,_1 for each n > 1, 
setting Xo = x. Set a = d(xo, x1), and note that d(xn,xn41) < A”a. If k >n, then 
by the triangle inequality, 

d(Xn, Xk) < d(Xn, Xn+1) + +++ + d(Xk-1, Xk) 
AP 
< alà +e HATI) < EA., 5.2 
<af(à” + ) < i (5.2) 
This implies that {x, : n € N} is a Cauchy sequence. The metric space K is 
complete, whence x, > z as n —> oo. Note that 


d(z,Tz) < d(z,Xn) + d(Xn,Xn41) + d(Xn41, TZ) < (1+ A)d(z, xn) + A"a > 0 


as n — co. Hence, d(z, Tz) = 0, and Tz = z. 
Thus, letting k — 00 in (5.2) yields 


d(T"x, z) < 


d(T"x, z) = d(Xn,z) < 


1-X 

As the next theorem shows, the strong contraction assumption in Banach’s 
Fixed-Point Theorem can be relaxed to decreasing distances if the space is compact. 
Recall that a metric space is compact if each sequence therein has a subsequence 
that converges to a point in the space. A subset of the Euclidean space R? is 
compact if and only if it is closed and bounded. 

See [Exercise 5.2] for an example of a map T : R > R that decreases distances 
but has no fixed points. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


102 5. EXISTENCE OF NASH EQUILIBRIA AND FIXED POINTS 


THEOREM 5.2.2 (Compact Fixed-Point Theorem). If K is a compact met- 
ric space and T : K —> K satisfies d(T(x),T(y)) < d(x,y) for all x #y € K, then 
T has a unique fixed point z € K. Moreover, for any x € K, we have T” (x) > z. 


PROOF. Let f : K > R be given by f(x) := d(x, Tx). We first show that f is 
continuous. By the triangle inequality we have 


d(x, Tx) < d(x,y) +d(y,Ty) +d(Ty,TXx), 
SO 
f(x) — fly) < d(x,y) + d (Ty, Tx) < 2d(x, y). 


By symmetry, we also have f(y) — f(x) < 2d(x,y), and hence f is continuous. 
Since K is compact, there exists z € K such that 


f(z) = min f(x). (5.3) 


If Tz Æ z, then f(T(z)) = d(Tz,T?z) < d(z,Tz) = f(z), and we have a con- 
tradiction to the minimizing property of z. Thus Tz = z. Uniqueness is 
obvious. 

Finally, we observe that iteration converges from any starting point x. Let 
£n = T”x, and suppose that x, does not converge to z. Then for some e > 0, the 
set S = {n|d(apn, z) > e} is infinite. Let {np} C S be an increasing sequence such 
that Yk := £n, > y £ z. Now 


d(Tyk, z) > d(Ty, z) < d(y, z). (5.4) 
But T”r+i =ne- (Tyg) = Yk+1, SO 


d(Tyk, z) > d(yn41,2) > d(y, z), 


contradicting {5.4}. 


EXERCISE 5.a. Prove that the convergence in the Compact Fixed-Point Theo- 
rem can be arbitrarily slow by showing that for any decreasing sequence {an }n>0 
tending to 0, there is a distance decreasing T : [0, ao] — [0, ao] such that T(0) = 0 
and d(T"ag,0) > an for all n. 


5.2.2. Sperner’s Lemma. In this section, we establish a combinatorial lemma 
that is key to proving Brouwer’s Fixed-Point Theorem. 


LEMMA 5.2.3 (Sperner’s Lemma). In d = 1: Suppose that the unit interval 
is subdivided 0 = to < ty < --- < tn = 1, with each t; being marked zero or one. If 
to is marked zero and tn is marked one, then the number of adjacent pairs (tj, t;41) 
with different markings is odd. 

Ind = 2: Subdivide a triangle into smaller triangles in such a way that a vertex 
of any of the small triangles may not lie in the interior of an edge of another. Label 
the vertices of the small triangles 0, 1 or 2: the three vertices of the big triangle 
must be labeled 0, 1, and 2; vertices of the small triangles that lie on an edge of the 
big triangle must receive the label of one of the endpoints of that edge. Then the 
number of properly labeled! | small triangles is odd; in particular, it is non-zero. 


1 All three vertices have different labels. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


5.2. FIXED-POINT THEOREMS* 103 


FIGURE 5.2. Sperner’s lemma when d = 2. 


PROOF. For d = 1, this is obvious: In a string of bits that starts with 0 and 
ends with 1, the number of bit flips is odd. 

For d = 2, we will count in two ways the set Q of pairs consisting of a small 
triangle and an edge labeled 12 on that triangle. Let Ai2 denote the number of 
12-labeled edges of small triangles that lie on the boundary of the big triangle. Let 
Biz be the number of such edges in the interior. Let Nabe denote the number of 
small triangles where the three labels are a, b and c. Note that 


Noi2 + 2N112 + 2N122 = |Q| = A12 + 2B12, 


because the left-hand side counts the contribution to Q from each small triangle 
and the right-hand side counts the contribution to Q from each 12-labeled edge. 
From the case d = 1, we know that Aj». is odd, and hence No12 is odd too. 


For another proof, see|Figure 5.3 


REMARK 5.2.4. Sperner’s Lemma can be generalized to higher dimensions. See 
85.4 


5.2.3. Brouwer’s Fixed-Point Theorem. 


DEFINITION 5.2.5. A set S C R? has the fixed-point property (abbreviated 
f.p.p.) if for any continuous function T : S — S, there exists x € S such that 
T(x) =x. 


Brouwer’s Theorem asserts that every closed, bounded, convex set K C R? has 
the f.p.p. Each of the hypotheses on K in the theorem is needed, as the following 
examples show: 

(1) K =R (closed, convex, not bounded) with T(x) = x + 1. 
(2) K = (0,1) (bounded, convex, not closed) with T(x) = 2/2. 
(3) K = {x € R: |x| € [1,2]} (bounded, closed, not convex) with T(x) = —z. 


REMARK 5.2.6. On first reading of the following proof, the reader should take 


n = 2. In two dimensions, simplices are triangles. To understand the proof for 
n> 2, should be read first. 


THEOREM 5.2.7 (Brouwer’s Fixed-Point Theorem for the simplex). The 
standard n-simpler A = {x | Sv, zi =1,Vi x; > 0} has the fixed-point property. 
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FIGURE 5.3. Sperner’s Lemma: The left side of the figure shows a labeling 
and the three fully labeled subtriangles it induces. The right side of the figure 
illustrates an alternative proof of the case d = 2: Construct a graph G with 
a node inside each small triangle, as well as a vertex outside each 1-3 labeled 
edge on the outer right side of the big triangle. Put an edge in G between 
each pair of vertices separated only by a 1-3 labeled edge. In the resulting 
graph G (whose edges are shown in purple), each vertex has degree either 0, 1 
or 2, so the graph consists of paths and cycles. Moreover, each vertex outside 
the big triangle has degree 0 or 1, and an odd number of these vertices have 
degree 1. Therefore, at least one (in fact an odd number) of the paths starting 
at these degree 1 vertices must end at a vertex interior to the large triangle. 
Each of the latter vertices lies inside a properly labeled small triangle. This 
is the highlighted subtriangle. 


PROOF. Let I be a subdivision (as in Sperner’s Lemma) of A where all triangles 
(or, in higher dimension, simplices) have diameter at most e. Given a continuous 
mapping T : A > A, write T(x) = (To(x), . . ., Ta (x)). For any vertex x of I, let 

L(x) =min{i : T(x) < zi}. 

(Note that since }7j'_)x; = 1 and X`; o Ti(x) = 1, if there is no i with T;(x) < a, 
then x is a fixed point.) 

By Sperner’s Lemma, there is a properly labeled simplex A, in I’, and this can 
already be used to produce an approximate fixed point of T; see the remark below. 

To get a fixed point, find, for each k, a simplex with vertices {z‘(k)}", in A 
and diameter at most +, satisfying 


T;(z'(k)) < zi(k) for all i € [0, n]. (5.5) 


Find a convergent subsequence z’ (kj) — z and observe that z'(kj;) —> z for all 
i. Thus, T;(z) < z; for all i, so T(z) = z. 


REMARK 5.2.8. Let Aj be a properly labeled simplex of diameter at most € as 
in the proof above. Denote by z°,z',...,z” the vertices of A1, where ¢(z') = i. 
Let w(e) := MaXjx—y|<e |T (x) — T(y)|. Then 
T,(2°) < T;(z') + w(e) < zi +(e) < z} +e + w(e). 
On the other hand, 
z?) =1- X T(z’ ) > 1- X_(z} +e+w(e)) = z} — n(e + w(e)). 


j#Fi j+i 
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Thus, 
\T(z°) — 2°| < n(n + 1)(e + w(e)), 


0 is an approximate fixed point. 


SO Z 


DEFINITION 5.2.9. Let S C R? and Š C R”. A homeomorphism A : S > S 
is a one-to-one continuous map with a continuous inverse. 


DEFINITION 5.2.10. Let S C A C R°. A retraction g : A > S is a continuous 
map where g restricted to S' is the identity map. 


LEMMA 5.2.11. Let S C ACR? and § CR". 
(i) If S has the f.p.p. and h: S — Š is a homeomorphism, then Š has the 
f.p-p- 
(ii) If g: AS is a retraction and A has the f.p.p., then S has the f.p.p. 


Proor. (i) Given T : § — continuous, let x € S be a fixed point of h~!oToh: 
S — S. Then h(x) is a fixed point of T. 
(ii) Given T : S — S, any fixed point of T o g : A —> S is a fixed point of T. 


LEMMA 5.2.12. For K C R® closed and convex, the nearest-point map Ù : 
R? + K where 


= U = = i = 
x= WOx)|| = de, £) = min |x — y| 
is uniquely defined and continuous. 


PROOF. For uniqueness, suppose that |x — y|| = |x — z| = d(x, kK) with 
y,z € K. Assume by translation that x = 0. Since (y + z)/2 € K, we have 


2, liy=2ll? — ly+zl? ly- zl? _ llyl? +llzl? 2 
< = = d(0, kK)’, 
d(0, K) 1 ar eee 5 d(0, K) 
so y =Z. 
To show continuity, let U(x) = y and V(x + u) = y +v. We show that 
Ivi] < llull. 


We know from the proof of the Separating Hyperplane Theorem that 
vi(y—x) 20 
and 
vi (x+u—y-—v)>0. 


Adding these gives v7 (u — v) > 0, so 


Ivl? = v”v < vu < [ivl] [lull 


by the Cauchy-Schwarz inequality. Thus ||v|| < |]ul]. 


Proof of Brouwer’s Theorem (THEOREM 5.1.2). Let K C R? be compact 


and convex. There is a simplex Ag that contains K. Clearly Ag is homeomorphic to 


a standard simplex, so it has the f.p.p. by}/Lemma 5.2.11{i). Then by|Lemma 5.2.12 


the nearest point map Y : Ay > K is a retraction. Thus, [Lemma 5.2.11{ii) implies 
that K has the f.p.p. 
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xX+U 


FIGURE 5.4. Illustration of the continuity argument in |Lemma 5.2.12 


5.3. Brouwer’s Fixed-Point Theorem via Hex* 


In this section, we present a proof of|/Theorem 5.1.2}via Hex. Thinking of a Hex 


board as a hexagonal lattice, we can construct what is known as a dual lattice in 
the following way: The nodes of the dual are the centers of the hexagons and the 
edges link every two neighboring nodes (those are a unit distance apart). 

Coloring the hexagons is now equivalent to coloring the nodes. 


Ss LAA 
VYY YSN 


FIGURE 5.5. Hexagonal lattice and its dual triangular lattice. 


This lattice is generated by two vectors u,v € R? as shown on the left side of 
Figure [5.6] The set of nodes can be described as {au + bv : a,b € Z}. Let’s put 
u = (0,1) and v = (8,4). Two nodes z and y are neighbors if |x — y|| = 1. 

We can obtain a more convenient representation of this lattice by applying a 
linear transformation G defined by 


G(u) = (-2. | ; G(v) = (0,1). 


The game of Hex can be thought of as a game on the corresponding graph 
(see [Figure 5.7). There, a Hex move corresponds to coloring one of the nodes. A 
player wins if she manages to create a connected subgraph consisting of nodes in 
her assigned color, which also includes at least one node from each of the two sets 
of her boundary nodes. 
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Tu) 


a> 


FIGURE 5.6. Action of G on the generators of the lattice. 


FIGURE 5.7. Under G an equilateral triangular lattice is transformed to an 
equivalent lattice. 


The fact that any colored graph contains one and only one such subgraph is 
inherited from the corresponding theorem for the original Hex board. 


PROOF OF BROUWER’S THEOREM USING HEX. As noted in the fact 
that there is a winner in any play of Hex is the discrete analogue of the two- 
dimensional Brouwer fixed-point theorem. We now use this fact about Hex (proved 
as|Theorem 1.2.6) to prove Brouwer’s theorem, at least in two dimensions. 

By |Lemma 5.2.11] we may restrict our attention to a unit square. Con- 
sider a continuous map T : [0,1]? > [0,1]?.. Componentwise we write T(x) = 
(T(x), T2(x)). Suppose it has no fixed points. Then define a function f(x) = 
T(x) — x. The function f is never zero and continuous on a compact set; hence 
|| f|| has a positive minimum € > 0. In addition, as a continuous map on a com- 
pact set, T is uniformly continuous; hence 46 > 0 such that ||x — y|| < 6 implies 
T(x) — T(y)|| < £. Take such a 6 with a further requirement 6 < (v2 — 1)e. (In 
particular, ô < Z) 


Consider a Hex board drawn in [0, 1]? such that the distance between neighbor- 
ing vertices is at most J, as shown in[Figure 5.8] Color a vertex v on the board yellow 
if |fi(v)| is at least ¢//2. If a vertex v is not yellow, then ||f(v)|| > £ implies that 
| fo(v)| is at least ¢/./2; in this case, color v blue. We know from Hex that in this 
coloring, there is a winning path, say, in yellow, between certain boundary vertices 
a and b. For the vertex a* neighboring a on this yellow path, we have 0 < aj < ô. 
Also, the range of T is in [0,1]?. Since a* is yellow, |T; (a*) — až| > ¢//2, and by 
the requirement on ô, we necessarily have T;(a*) — a* > ¢/./2. Similarly, for the 
vertex b* neighboring b, we have T,(b*) — bt < —e//2. Examining the vertices 
on this yellow path one-by-one from a* to b*, we must find neighboring vertices u 
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and v such that T, (u) — u1 > e/V2 and T;(v) — vı < —e/V2. Therefore, 
Tiu) —Tiv) > 2 — (v + u) > V2 -8 Se. 


V2 


However, ||u — v|| < 6 should also imply ||T (u) — T(v)|| < £, a contradiction. 


[0,1]? 


FIGURE 5.8. Proving Brouwer via Hex. 


5.4. Sperner’s Lemma in higher dimensions* 


DEFINITION 5.4.1 (Simplex). An n-simplex A(vo,v1,...,Un) is the convex 
hull of a set of n + 1 points vg, v1,...,Un E€ R” that are affinely independent; i.e., 
the n vectors v;i — vo, for 1 <i < n, are linearly independent. 


DEFINITION 5.4.2 (Face). A k-face of an n-simplex A(vo,v1,.-.,Un) is the 
convex hull of any k+1 of the points vo, v1,..., Un. (See the left side of Figure[5.9}) 


EXERCISE 5.b. 
(1) Show that n+ 1 points vo, v1,...,Un E€ R? are affinely independent if and 


only if for every non-zero vector (a9,...,Qn) for which X ,<;<,, Qi = 0, it 
must be that X o<i<n aivi #0. Thus, affine independence is a symmetric 
notion. a 


(2) Show that a k-face of an n-simplex is a k-simplex. 


DEFINITION 5.4.3 (Subdivision of a simplex). A subdivision of a simplex 
A(v9,U1,-++;Un) is a collection I of n-simplices such that for every two simplices 
in I, either they are disjoint or their intersection is a face of both. 


REMARK 5.4.4. Call an (n — 1)-face of A; € T an outer face if it lies on an 
(n — 1)-face of A(vo,v1,...,Un); otherwise, call it an inner face. (See the right 
side of Figure[5.9}) It follows from the definition of subdivision that each inner face 
of A; ET is an (n — 1)-face of exactly one other simplex in I. Moreover, if F is an 
(n — 1)-face of A (vo, 1,---,Un), then 


T(P) = {A190 Fhaver 
is a subdivision of F. (See Figure [5.10}) 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


5.4. SPERNER’S LEMMA IN HIGHER DIMENSIONS* 109 


0-face 


1-face outer 


face 
` N 


inner 
face 


FIGURE 5.9. The left side shows a 2-simplex and its faces. The right side 
shows an inner face and an outer face in a subdivision. 


BESS 


FIGURE 5.10. The figure shows a subdivision and its restriction to face F. 


LEMMA 5.4.5. For any simplex A(vo, v1,..., Un) and e > 0, there is a subdivi- 
sion T such that all simplices in T have diameter less than e. 


The case of n = 2 is immediate. See|Figure 5.12} To prove the lemma in higher 
dimensions, we introduce barycentric subdivision defined as follows: Each subset 
S C {0,...,n} defines a face of the simplex of dimension |S|— 1, the convex hull 
of v;, for i € S. The average 

1 
Ug := — Ui 
ig] 2 


ies 
is called the barycenter of the face. Define a graph Ga on the vertices vg with an 
edge (vs, vr) if and only if S Cc T. Each simplex in the subdivision is the convex 
hull of the vertices in a maximum clique in Gag. 
Such a maximum clique corresponds to a collection of subsets So C Si C 
S2- C Sn with |S;|) = i + 1. Thus there is a permutation on {0,...,n} with 
m(0) = So and q(i) = S; \ S;_-1 for all i > 1. If we write w; = Vr{i), then 


ü _ Wo +w +... + We 
— k+1 
Thus the vertices of this clique are 
i Wo +w, Wo + wı + We 1 ou 
0; 2 ’ 3 ’ , n+1 = ie 


The convex hull of these vertices, which we denote by A,, is 
A i= 5 aivi | On(9) Z + È Qr(n) = 0 and 5 a; = 1}. 
O<i<n O<i<n 


The full subdivision is T4 = {A, | 7 a permutation of {0,...,n}}. See Figures 
and 
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EXERCISE 5.c. 


(1) Verify the affine independence of vg,,...,Us,,- 
(2) Verify that A, is the convex hull of ug,,...,us,,, where m(i) = Si \ Sj-1. 


AAA 


FIGURE 5.11. This figure shows two steps of barycentric subdivision in two dimensions. 


V2 


FIGURE 5.12. The left-hand side of this figure shows (most of) the vertices 
resulting from one step of barycentric subdivision in 3 dimensions. The green 
vertices are barycenters of simplices of dimension 1, the purple vertices (not 
all shown) are barycenters of simplices of dimension 2, and the pink vertex is 
the barycenter of the full simplex. The right-hand side of this figure shows 
two of the subsimplices that would result from barycentric subdivision. The 
upper subsimplex outlined corresponds to the permutation {0,3,1,2} and the 
bottom subsimplex corresponds to the permutation {2,3,1,0}. 


PROOF OF [LEMMA 5.4.5} The diameter of each simplex A, in Tı is the max- 
imum distance between any two vertices in A,. We claim this diameter is at 
most =77D, where D is the diameter of A(vo,...,Un). Indeed, for any k,r in 
{1,... n+ 1}, 
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1 k-1 1 r—-1 1 k-1r-1 
pa T i 2 A a 
i=0 j=0 {=D j=0 
kr—r 
< kp D 


Iterating the barycentric subdivision m times yields a subdivision I'm in which 
m 


the maximum diameter of any simplex is at most (= D. 
See Exercise[5.d]below for the verification that this subdivision has the required 


intersection property. 


The following corollary will be useful in]Chapter 11 


COROLLARY 5.4.6. Let A be a simplex on k vertices. Let 1 be any subdivision 
of A obtained by iterative barycentric subdivision. Let Gr be the graph whose edges 
are the 1-faces of T. Then Gr can be properly colored with k colors; i.e., all vertices 
in each subsimplez have different colors. 


PROOF. To see that such a proper coloring exists, suppose that T is constructed 
by iterating barycentric subdivision m times. Then color each vertex in the m— 14 
barycentric subdivision with cg. Within each of the subsimplices in this level, color 
each vertex that is a barycenter of a face of dimension i with c;. Since every edge 
connects barycenters of faces of different dimension, this is a proper coloring. 


FIGURE 5.13. This picture shows the coloring of Corollary [5.4.6] for a simplex 
on three vertices to which iterative barycentric subdivision has been applied 
twice. The subdivision after one step is highlighted in gray. The corresponding 
vertices after the first subdivision are colored black (c3). All vertices that are 
barycenters of dimension 1 are colored green (ci), and all vertices that are 
barycenters of dimension 2 are colored purple (c2). 


EXERCISE 5.d. (1) Verify that A, has one outer face determined by the equa- 
tion O(n) = 0 and n inner faces determined by the equations &r(k) = Qr(k+1) for 
0<k<n-—1. (2) Verify that [lı is indeed a subdivision. (3) Verify that for 
any (n — 1)-face F of A(vo,v1,..., Un), the subdivision T4(F) is the barycentric 
subdivision of F. 
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DEFINITION 5.4.7 (Proper labeling of a simplex). A labeling £ of the ver- 
tices of an n-simplex A(vp, v1,..-,Un) is proper if (vo), 2(v1),..-,2(Un) are all dif- 
ferent. 


DEFINITION 5.4.8 (Sperner labeling of a subdivision). A Sperner labeling 


L of the vertices in a subdivision I of an n-simplex A(vo, v1,...,Un) is a labeling in 
which 
e A(vo,U1,---,;Un) is properly labeled, 
e all vertices in I are assigned labels in {€(vo), €(v1),..-, &(Un)}, and 
e the labeling restricted to each face of A(vo,...,Un) is a Sperner labeling 
there. 


LEMMA 5.4.9 (Sperner’s Lemma for general n). Let £ be a Sperner labeling 
of the vertices in T, where T is a subdivision of the n-simplex A(vo,U1,..-,Un)- 
Then the number of properly labeled simplices in T is odd. 


PROOF. We prove the lemma by induction on n. The cases n = 1,2 were 
proved in [85.2.2] For n > 2, consider a Sperner labeling of I. Call an (n — 1)-face 
good if its vertex labels are (vo),...,€(Un—1). 

Let g denote the number of good inner faces; let gg be the number of good 
outer faces on A(vo,...,Un—1), and let Nj be the number of simplices in I with 
labels {0(v;)}"29 and (vj). Counting pairs 


(simplex in I, good face of that simplex), 


by the remark preceding |Lemma 5.4.5} we obtain 


n-1 


2 XON; + Nn = 29+ go. 
j=0 


Since gg is odd by the inductive hypothesis, so is Ny. 


Notes 


In his 1950 Ph.D. thesis, John Nash proved the existence of an equilibrium using 
Brouwer’s Fixed Point Theorem [Brol1]. In the journal publication [Nas50a], he used 
Kakutani’s Fixed Point Theorem instead. According to [OR14], the proof of Brouwer’s 
Theorem from Sperner’s Lemma is due to Knaster, Kuratowski, and Mazurkiewicz. 


The proof of Brouwer’s theorem via Hex is due to David Gale |Gal79}. [Theorem 5.2.2] is 


due to |Ede62). See the book by Border for a survey of fixed-point theorems and 
their applications. 
See |Rud64] for a discussion of general metric spaces. 


Exercises 


5.1. Fill in the details showing that, in a symmetric game, with A = BT, there 
is a symmetric Nash equilibrium. As suggested in the text, use the set 
D = {(a,x) : £ € An} in place of K in the proof of Nash’s Theorem. 


5.2. Show that the map T : R > R given by 
1 
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John Nash 


decreases distances but has no fixed point. 


5.3. Use|Lemma 5.2.11{ii) to show that there is no retraction from a ball to a 


sphere. 


5.4. Show that there is no retraction from a simplex to its boundary directly 
from Sperner’s Lemma, and use this to give an alternative proof of Brouwer’s 
Theorem. (This is equivalent to the previous exercise because a simplex is 
homeomorphic to a ball.) 


5.5. Use Brouwer’s Theorem to show the following: Let B = B(0,1) be the 
closed ball in R?. There is no retraction from B to its boundary OB. 


S 5.6. Show that any d-simplex in Rt contains a ball. 


S 5.7. Let K C R? be a compact convex set which contains a d-simplex. Show 
that K is homeomorphic to a closed ball. 
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CHAPTER 6 


Games in extensive form 


One of the key features of real-life games is that they take place over time 
and involve interaction, often with players taking turns. Such games are called 
extensive-form games. 


6.1. Introduction 


We begin with an example from 


EXAMPLE 6.1.1 (Subtraction). Starting with a pile of four chips, two players 
alternate taking one or two chips. Player I goes first. The player who removes the 
last chip wins. 


FIGURE 6.1. A game tree corresponding to the Subtraction game: Each leaf 
is labeled with the payoffs of the players. At nodes IIc, Ic, and Id, there is 
only one action that can be taken. At node Ib, player I loses if she removes 
one chip and wins if she removes two, so her choice, highlighted in the figure, 
is obvious. Proceeding up the tree, at node IIb, player II loses if he removes 
one chip and wins if he removes two, so again his choice is obvious. At node 
IIa, he loses either way, so in fact, his strategy at this node doesn’t matter 
(indicated by the dots on the edges). Finally, at node Ia, player I wins if she 
removes one chip and loses if she removes two. 


A natural way to find the best strategy for each player in a simple game like this 
is to consider the game tree, a representation of how the game unfolds, and then 


114 
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apply backward induction , that is, determine what action to play from the leaves 
up. At each node, the player will pick the action that leads to the highest payoff. 
Since we consider nodes in order of decreasing depth, when a node is considered, the 
payoffs that player will receive for each action he might take are already determined, 
and thus, the best response is determined. Figure illustrates this process for 
the Subtraction game and shows that player I has a winning strategy. 

Given an extensive-form game, we can in principle list the possible pure state- 
gies of each of the players and the resulting payoffs. (This is called normal form.) 
In the Subtraction game, a strategy for player I specifies his action at node I, and 
his action at node Ip. Similarly for player II. The resulting normal-form game is 
the following (where we show only the payoff to player I since this is a zero-sum 


game): 
player IT 
las 1, la, 2p 2a, l 2a, 25 
F la, lo —1 -1 1 1 
Flr) i oa oaoa 
OS a> 
2a; 2b 1 -1 1 —1 


DEFINITION 6.1.2. A k-player finite extensive-form game is defined by a 
finite, rooted tree 7. Each node in 7 represents a possible state in the game, 
with leaves representing terminal states. Each internal (nonleaf) node v in T is 
associated with one of the players, indicating that it is his turn to play if/when 
v is reached. The edges from an internal node to its children are labeled with 
actions, the possible moves the corresponding player can choose from when the 
game reaches that state. Each leaf/terminal state results in a certain payoff for 
each player. We begin with games of complete information, where the rules of the 
game (the structure of the tree, the actions available at each node, and the payoffs 
at each leaf) are common knowledgd|] to all players. 

A pure strategy for a player in an extensive-form game specifies an action 
to be taken at each of that player’s nodes. A mixed strategy is a probability 
distribution over pure strategies. 


The kind of equilibrium that is computed by backward induction is called a 
subgame-perfect equilibrium because the behavior in each subgame, is also 
an equilibrium. (Each node in the game tree defines a subgame, the game that 
would result if play started at that point.) 


EXAMPLE 6.1.3 (Line-Item Veto). Congress and the President are at odds 
over spending. Congress prefers to increase military spending (M), whereas the 
President prefers a jobs package (J). However, both prefer a package that includes 
military spending and jobs to a package that includes neither. The following table 
gives their payoffs: 


| Military Jobs Both Neither 
Congress 4 1 3 2 
President 1 4 3 2 


1 That is, each player knows the rules, he knows the other players know the rules, he knows 
that they know that he knows the rules, etc. 
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A line-item veto gives the President the power to delete those portions of a 


spending bill he dislikes. Surprisingly though, as |Figure 6.2] shows, having this 
power can lead to a less favorable outcome for the President. 


In the games we’ve just discussed, we focused on subgame-perfect equilibria. 
However, as the following example shows, not all equilibria have this property. 


EXAMPLE 6.1.4 (Mutual Assured Destruction (MAD)). Two countries, 
say A and B, each possess nuclear weapons. A is aggressive and B is benign. 
Country A chooses between two options. The first is to escalate the arms race, 
e.g., by firing test missiles, attacking a neighboring country, etc. The second is to 
do nothing and simply maintain the peace. If A escalates, then B has two options: 
retaliate, or back down. Figure [6.3] shows how the game might evolve. 

If A believes that B will retaliate if she escalates, then her best action is to 
maintain the peace. Thus (maintain the peace, retaliate), resulting in payoffs of 
(0,0), is a Nash equilibrium in this game. However, it is not a subgame-perfect equi- 
librium since in the subgame rooted at B’s node, B’s payoff is maximized by backing 
down, rather than retaliating — the subgame-perfect equilibrium is (escalate, back 
down). 

Which equilibrium makes more sense? The threat by B to retaliate in the event 
that A escalates may or may not be credible since it will result in a significantly 
worse outcome to B than if he responds by backing down. Thus, A may not believe 
that B will, in fact, respond this way. On the other hand, nuclear systems are 
sometimes set up to automatically respond in the event of an attack, precisely to 
ensure that the threat is credible. This example illustrates the importance of being 
able to commit to a strategy. 


REMARK 6.1.5. The structure of the extensive game shown in Figure|6.3]comes 
up in many settings. For example, consider a small and efficient airline (player A) 
trying to decide whether to offer a new route that encroaches on the territory of 
a big airline (player B). Offering this route corresponds to “escalating”. Player 
B can then decide whether or not to offer a discount on its corresponding flights 
(retaliate) or simply cede this portion of the market (back down). 


EXAMPLE 6.1.6 (Centipede). There is a pot of money that starts out with $4 
and increases by a dollar each round the game continues. Two players take turns. 
When it is a player’s turn and the pot has $p, that player can either split the pot in 
his favor by taking $| 2+ | (the “greedy” strategy), or allow the game to continue 
(the “continue” strategy) enabling the pot to increase. 

Figure |6.4]shows that the unique subgame-perfect equilibrium is for the players 
to be greedy at each step. If, indeed, they play according to the subgame-perfect 
equilibrium, then player I receives $4 and player II gets nothing, whereas if they 
cooperate, they each end up with $50. (See [Exercise 6.1}) 

This equilibrium is counterintuitive. Indeed, laboratory experiments have shown 
that this equilibrium rarely arises when “typical” humans play this game. On the 
other hand, when the experimental subjects were chess players, the subgame-perfect 
outcome did indeed arise. Perhaps this is because chess players are more adept at 
backward induction. (See the notes for some of the relevant references.) 

Regardless, a possible explanation for the fact that the subgame-perfect equi- 
librium does not arise in typical play is that the game is simply unnatural. It is not 
necessarily reasonable that the game goes on for a very long, but fixed, number n 
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Congress 
neither military 
President jobs President M+J 
A 
F, veto sign 
President President 


veto sign veto sign 


(2, 2) | (1,4) 


without line item veto 


Congress 


military 


= 


neither 


P 


President jobs President M+J 
[ veto sign 
President President 
(2, 2) / í 
veto sign veto both veto M 
veto J sign both 


(2, 2) | (1,4) 
with line item veto 


FIGURE 6.2. The top part of the figure shows the game from [Example 6.1.3] 
without the line-item veto. The bottom part of the figure shows the game 
with the line-item veto. The highlighted edges show the actions taken in 
the subgame-perfect equilibrium: With the line-item veto, the result will be 
military spending, whereas without the line-item veto, the result will be a 
military and jobs bill. 
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maintain peace escalate 


Y 
(0, 0) B) 


retaliate back down 


N 
(1, -1) 


FIGURE 6.3. In the MAD game, (maintain peace, escalate) is a Nash equilib- 
rium which is not subgame-perfect. 


4 5 6 7 96 97 98 99 
Oo-?? 7 ooo?" . 


(4,0) (1,4) (5,1) (2,5) (50, 46) (47, 50)(51, 47) (48, 51) 


pot sizes: 


FIGURE 6.4. The top part of the figure shows the game and the resulting 
payoffs at each leaf. At each node, the “greedy” strategy consists of following 
the downward arrow, and the “continue” strategy is represented by the arrow 
to the right. Backward induction from the node with pot-size 99 shows that 
at each step the player is better off being greedy. 


of rounds, and that it is common knowledge to all players that n is the number of 
rounds. 


The extensive-form games we have seen so far are games of perfect informa- 
tion. At all times during play, a player knows the history of previous moves and 
hence which node of the tree represents the current state of play. In particular, she 
knows, for each possible sequence of actions that players take, exactly what payoffs 
each player will obtain. 

In such games, the method of backward induction applies. Since this method 
leads to a strategy in which play at each node in the game tree is a best response 
to previous moves of the other players, we obtain the following: 


THEOREM 6.1.7. Every finite extensive-form game of perfect information has a 
subgame-perfect pure Nash equilibrium which can be computed by backward induc- 
tion. 


EXERCISE 6.a. Prove|Theorem 6.1.7 
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6.2. Games of imperfect information 


EXAMPLE 6.2.1. Recall the game of Chicken with the following payoff matrix: 


player II 
2 Swerve (S) Drive (D) 
5 | Swerve (S) (1, 1) (—1, 2) 
@ | Drive (D)| (2,-1) (-M,-M) 
A 


Q 


Swerve 


Q 


Swerve Drive 


red 
red 
Swerve 


Drive 


Swerve Drive Swerve 


Drive 


ch einen dacs lag ty 


Swerve Drive 


FIGURE 6.5. The figure shows an extensive-form version of the Chicken game. 
Player II’s nodes are in the same information set because player Ps and 
player II’s actions occur simultaneously. Therefore, player II cannot distin- 
guish which of two states he is in and must choose the same action at both. 
The figure on the right shows the version of Chicken in which player I’s car is 


so easy to maneuver that she can escape at the very last minute after seeing 
player II’s choice. 


This, and any normal-form game, can be represented as an extensive-form 
game, as shown in Figure When player II moves, he doesn’t know what his 
opponent’s move was. We capture this by defining an information set, consisting 
of the two player IT nodes, and insisting that player II’s action is the same at both 
nodes. 

Consider now a variant of the game in which player I has a sports car that can 
escape collision in the last second, whereas player II’s car cannot. This leads to the 
game shown on the right-hand side of Figure [6.5] 

If we reduce to normal form, we obtain the following matrix, which is just 
the original matrix shown above, with one row repeated, and thus has the same 


equilibria: 
player II 
= Swerve (S) Drive (D) 
5 Swerve (S) (1, 1) (—1, 2) 
F Drive/Drive (DD) | (2,—1) (—M,-M) 
A 
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However, at the danger node, after players I and II both choose Drive, the Escape 
strategy is dominant for player I. Thus, the subgame-perfect equilibria are strategy 
pairs where player I escapes at the danger node and player II always drives. It is 
still a Nash equilibrium for player I to always drive and player II to swerve, but this 
equilibrium includes the “incredible threat” that player I will drive at the danger 
nodeP| The reduction from extensive form to normal form suppresses crucial timing 
of player decisions. 


In an extensive game of imperfect information (but still complete informa- 
tion), each player knows the payoffs that all players will get for each possible action 
sequence, but does not know all actions that have been taken. This happens either 
because an opponents’ action occurs simultaneously or simply because the player 
is not privy to information about what an opponent is doing. 

Information sets are used to model the uncertainty a player has about which 
node of the tree the game is at when it’s his turn to choose an action: 


DEFINITION 6.2.2. In an extensive-form game, a player’s nodes are partitioned 
into information sets. (In games of perfect information, each information set is 
a single node.) For any player i and any two nodes v,w in an information set S$ 
of that player, the set of actions available at v is identical to the set of actions 
available at w, and the same action must be selected at both nodes. 

For any node v associated with player i, let Zi(v),Z2(v),...,Z7,(v) be the 
information sets of i along the path to v, and let a;(v) be the action i took at Z;(v). 
The information available to player i at v is the history of information sets and 
actions he took, which we denote by H(v) := {Z;(v)}72, U {a;(v)} 27+. 

A pure strategy for a player in a game with information sets defines an action 
for that player at each of his information sets, and, as always, a mixed strategy 
is a probability distribution over pure strategies. 


Note that in a game of imperfect information, backward induction usually can 
not be employed to find an equilibrium. For example, the optimal strategy for 
player I depends on which node in the information set she is at, which depends 
on player II’s strategy, and that decision is made at parents of player I’s nodes in 
the tree. Indeed, once nonsingleton information sets are present, the notion of a 
subgame has to be defined more carefully: Subgames can’t split up nodes in the 
same information set. 


REMARK 6.2.3. Only a forgetful player would consider two nodes with different 
histories to be in the same information set. We restrict attention to players with 
perfect recall. For such a player, if two nodes v and w are in the same information 
set, then H(v) = H(w). 


6.2.1. Behavioral strategies. The use of the normal form version of an ex- 
tensive game as a method for finding a Nash equilibrium is computationally complex 
for games that involve multiple rounds of play — the number of pure strategies a 
player i has is exponential in the number of information sets associated to i. A 
more natural way to construct a mixed strategy is to define the behavior at each 
information set independently. This is called a behavioral strategy. 


2 So she is worse off for having a better car. 
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DEFINITION 6.2.4. A behavioral strategies b; for a player i in an extensive- 
form game is a map that associates to each information set I of i a probability 
distribution b;(I) over the actions available to ¿į at I. (For v € I, we write b;(v) = 
b;(1).) 

Every behavioral strategy b; induces a corresponding mixed strategy, obtained 
by choosing an action independently at every information set S of player i according 
to the distribution b;(S). 


REMARK 6.2.5. Some mixed strategies are not induced by a behavioral strategy 
because of dependence between the choice of actions at different information sets. 
(See Figure [6.6}) Thus, while a Nash equilibrium in mixed strategies always exists 
via reduction to the normal form case, it is not obvious that a Nash equilibrium in 
behavioral strategies exists. 


FIGURE 6.6. In this example, player II’s mixed strategy puts probability pı 
on the two left actions and probability 1 — pı on the two right actions. This 
mixed strategy is not induced by any behavioral strategy because the action 
player II takes at its two nodes is correlated. Notice though that if player II’s 
strategy was to play the left action with probability pı and the right action 
with probability 1 — pı at each node independently instead, then for any fixed 
player I strategy, her expected payoff would be the same as it is under the 
correlated strategy. 


To show that Nash equilibria in behavioral strategies exist, we will need the 
following definition: 


DEFINITION 6.2.6 (Realization-equivalence). Two strategies s; and sj for 
player 7 in an extensive-form game are realization-equivalent if for each strategy s_; 
of the opponents and every node v in the game tree, the probability of reaching v 
when strategy profile (s;,s_;) is employed is the same as the probability of reaching 
v when (s/,s_;) is employed. 

REMARK 6.2.7. It is enough to verify realization-equivalence for opponent strat- 
egy profiles s_; that are pure. 


THEOREM 6.2.8. Consider an extensive game of perfect recall. Then for any 
player i and every mixed strategy si, there is a realization-equivalent s; that is in- 
duced by a behavioral strategy b;. Hence, for every possible strategy of the opponents 
s_;, every player i’s expected utility under (s;,S_;) is the same as his expected utility 
under (s/,,S_;). 
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P4 Pit p2 + p3 


Pit pot ps Pit P2 + p3 


FIGURE 6.7. This figure gives a simple example of the construction of the 
behavioral strategy at nodes A and B. The labels on the edges represent the 
transition probabilities in the behavioral strategy. 


PROOF. Let s;(A) denote the probability the mixed strategy s; places on a set 
of pure strategies A. If v is a player i node, let v1,...,vz-1 be the player i nodes 
on the path to v, and let aj be the action at vj leading towards v. Let Q(v) be the 
set of pure strategies of player 7 where he plays aj at vj for j < t. The strategies in 
Q(v) where he also plays action a at v are denoted Q (v,a). The behavioral strategy 
b;(v) is defined to be the conditional distribution over actions at v, given that v is 
reached. Thus the probability b;(v)a of taking action a is 

s:(Q(v)) 
if the denominator is nonzero. Otherwise, let b;(v) be the uniform distribution over 
the actions at v. (See [Figure 6.7}) 

The key observation is that, by the assumption of perfect recall, b;(v)a = 
b;(w)a if v and w are in the same information set Z, and therefore this is a valid 
behavioral strategy. 

Finally, for a fixed pure strategy s_;, it follows by induction on the depth of 
the node v that the probability of reaching that node using the behavioral strategy 
b; is the same as the probability of reaching that node using s;. 


bi(v)a = 


COROLLARY 6.2.9. In a finite extensive game of perfect recall, there is a Nash 
equilibrium in behavioral strategies. 


6.3. Games of incomplete information 


Sometimes a player does not know exactly what game he is playing, e.g., how 
many players there are, which moves are available to the players, and what the 
payoffs at terminal nodes are. For example, in a game of poker, each player doesn’t 
know which cards his opponents received, and therefore doesn’t know the payoffs in 
each terminal state; in an eBay auction, a player doesn’t know how many competing 
bidders there are or how much they value the object being auctioned. These are 
games of incomplete information. 
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FIGURE 6.8. Poker is a game of incomplete information. 


In such a game, there is not much a player can do except guard against the 
worst case. Thus, a natural strategy in such a game is a safety strategy, wherein a 
player chooses a strategy which maximizes his payoff in the worst case. 


6.3.1. Bayesian games. In many situations, however, the players have prob- 
abilistic prior information about which game is being played. Under this assump- 
tion, a game of incomplete information can be converted to a game of complete but 
imperfect information using moves by nature and information sets. 


EXAMPLE 6.3.1 (Fish-Selling Game). Fish being sold at the market is fresh 
with probability 2/3 and old otherwise, and the customer knows this. The seller 
knows whether the particular fish on sale now is fresh or old. The customer asks 
the fish-seller whether the fish is fresh, the seller answers, and then the customer 
decides to buy the fish or to leave without buying it. The price asked for the fish 
is $12. It is worth $15 to the customer if fresh and nothing if it is old. Thus, if the 
customer buys a fresh fish, her gain is $3. The seller bought the fish for $6, and if 
it remains unsold, then he can sell it to another seller for the same $6 if it is fresh, 
and he has to throw it out if it is old. On the other hand, if the fish is old, the 
seller claims it to be fresh, and the customer buys it, then the seller loses $R in 
reputation. The game tree is depicted in 


The seller clearly should not say “old” if the fish is fresh. Hence we should 
examine two possible pure strategies for him: FF means he always says “fresh”; 
OF means he always tells the truth. For the customer, there are four ways to react 
to what he might hear. Hearing “old” means that the fish is indeed old, so it is 
clear that she should leave in this case. Thus two rational strategies remain: BL 
means she buys the fish if she hears “fresh” and leaves if she hears “old”; LL means 
she always leaves. Here are the expected payoffs for the two players, averaged over 
the randomness coming from the actual condition of the fish: 
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FIGURE 6.9. The seller knows whether the fish is fresh; the customer only 
knows the probability. 


Fresh Old 
2/3 1/3 
Seller Seller 
“Fresh” “Fresh” “Old” 
Customer] ‘Customer| Customer 


FIGURE 6.10. The game tree for the Fish-Selling game. The top node in the 
tree is a move by nature, with the outcome being fresh with probability 2/3 
or old with probability 1/3. 


customer 

BL LL 
5| FF| (@—28/3,-2) (2,0) 
| OF (2,2) (—2,0) 


We see that if losing reputation does not cost too much in dollars, i.e., if R < 12, 
then there is only one pure Nash equilibrium: FF against LL. However, if R > 12, 
then the (OF, BL) pair also becomes a pure equilibrium, and the payoffs to both 
players from this equilibrium are much higher than the payoffs from the other 
equilibrium. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


6.3. GAMES OF INCOMPLETE INFORMATION 125 


Naturg 


firm Il can produce firm II cannot produce 
a competitive product a competitive product 
0.5 0.5 
announce announce cede the 
competitive competitive market 


product product 


(16, 0) 


Player | doesn’t 
know which state 
she’s in 
Stay Sell Stay Sell 


FIGURE 6.11. The figure shows the Large Company vs Startup game. Prior 
to the beginning of the game, player I announces her new technology. At the 
beginning of the game, there is a move by nature, which determines whether 
or not II actually can pull together a competitive product. Only player II is 
privy to the outcome of this move by nature. The two nodes at which player 
I makes a move form a single information set: player I does not know which 
of these states she is in. All she knows is that II has announced a competitive 
product, and knowing only that, she has to decide between competing with 
the giant or letting the giant buy her out. Thus, her strategy is the same at 
both nodes in the information set. 


EXAMPLE 6.3.2 (Large Company versus Startup). A startup (player I) 
announces an important technology threatening a portion of the business that a 
very large company (player II) engages in. Given the resources available to II, 
e.g., a very large research and development group, it is possible that II will be 
able to pull together a competitive product in short order. One way or another, 
II may want to announce that a competitive product is in the works regardless of 
its existence, simply to intimidate the startup and motivate it to accept a buyout 
offer. The resulting game tree, which has a move by nature and an information set 


is shown in Figure We can reduce the game to normal form by averaging over 
the randomness: 


player II 
= announce/cede announce/announce 
5 | stay in (I) (6, 10) (8, 8) 
& | sell out (O) (10, 6) (4, 12) 
A 
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For example, if player I’s strategy is to stay in and player II’s strategy is an- 
nounce/cede (i.e., announces a competitive strategy only if he can produce a com- 
petitive product), then the payoffs are the average of (—4, 20) and (16,0). 


A Bayesian game is an extensive-form game of imperfect information, with 
a first move by nature and probabilities that are in common knowledge to all the 
players. Different players may have different information about the outcome of the 
move by nature. This is captured by their information sets. 

We summarize the procedure for constructing a normal-form game Gy associ- 
ated to a two-player Bayesian game G: The actions available to player I in Gy are 
her pure strategies in G, and similarly for player II. The payofff matrices A and B 
for players I and II have entries 


A(sı, Str) = hh [ur(sr, Sir, M)] and B(s1, sir) = 1, [urr(s1, str, M)] 


where u;(s1, s, M) is the payoff to player i when player I plays pure strategy sr, 
player IT plays pure strategy sy, and M is the move by nature in G. 


6.3.2. Signaling. 


EXAMPLE 6.3.3 (Lions and Antelopes). Antelopes have been observed to 
jump energetically when they notice a lion. Why do they expend energy in this way? 
One theory is that the antelopes are signaling danger to others at some distance, 
in a community-spirited gesture. However, the antelopes have been observed doing 
this even when there are no other antelopes nearby. The currently accepted theory 
is that the signal is intended for the lion, to indicate that the antelope is in good 
health and is unlikely to be caught in a chase. This is the idea behind signaling. 


stot don't stot don't stot 


chase ignore 


FIGURE 6.12. Lions and antelopes. Given that an antelope doesn’t stot, chas- 
ing yields the lioness a positive payoff. Therefore, a healthy antelope is moti- 
vated to stot. 
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FIGURE 6.13. An antelope stotting to indicate its good health. 


Consider the situation of an antelope catching sight of a lioness in the distance. 
Suppose there are two kinds of antelope, healthy (H) and weak (W). A lioness can 
catch a weak antelope but has no chance of catching a healthy antelope (and would 
expend a lot of energy if he tried). 

This can be modeled as a combination of two simple games (A” and AW), 
depending on whether the antelope is healthy or weak, in which case the antelope 


has only one strategy (to run if chased), but the lioness has the choice of chasing 
(C) or ignoring (I): 


antelope antelope 
AH = run if chased 
chase (—1,—1) 
ignore (0, 0) 


w run if chased 
at a chase | (5, —100) 
ignore (0, 0) 


lioness 


lioness 


The lioness does not know which game she is playing — and if 20% of the 
antelopes are weak, then the lioness can expect a payoff of (.8)(—1)+(.2)(5) = .2 b 
chasing. However, the antelope does know, and if a healthy antelope can credibl 
convey that information to the lioness by jumping very high, both will be better 
off — the antelope much more than the lioness! 


6.3.3. Zero-sum games of incomplete information. 


EXAMPLE 6.3.4 (A simultaneous randomized game). A zero-sum game 
is chosen by a fair coin toss. The players then make simultaneous moves. These 


moves are revealed and then they play a second round of the same game before any 
payoffs are revealed: 


player II player II 
L R L R 
an= ua o Ar= "Toyo 0 
e/|D| 0 0 #/|D|0 -1 
A A 


3 A weak antelope cannot jump that high. 
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If neither player knows the result of the initial coin toss, each player will use the 
mixed strategy (4,5), and the value of the game to player I (for the two rounds) 
is —}. Now suppose that player I learns the result of the coin toss before playing 
the game. Then she can simply choose the row with all zeros and lose nothing, 
regardless of whether player IT knows the coin toss as well. 


Next consider the same story, but with matrices 


player IT player IT 
Hes bes L R To L R 
= (ole oe) we" Por] @ 0 
FIDIO 0 =/D|0O 1 

T T 


Again, without knowing the result of the coin toss, the value to player I in each 
round is L, If player I is informed of the coin toss at the start, then in the second 
round, she will be greedy, i.e., choose the row with the 1 in it. The question remains 
of what she should do in the first round. 

Player I has a simple strategy that will get her 3 — this is to ignore the coin 
flip on the first round (and choose U with probability i), but then, on the second 
round, to be greedy. 

In fact, 3 is the value of the game. A strategy for player II that shows this 
is the following: In the first round, he plays L with probability Z. In the second 
round, he flips a fair coin. If it comes up heads, then he assumes that player I 
played greedily in the first round] and he responds accordingly; if it comes up tails, 
then he chooses L with probability 5. If player I plays greedily in the first round, 
then she gets 5 in the first round and 1 in the second round. If player I is sneaky 
(plays D in A” and U in A’), then she gets 0 in the first round and ł in the second 
round. Finally, if player I plays the same action in round 1 for both A¥ and A’, 
then she will receive ; in that round and 4 in the second round. 

It is surprising that sometimes the best use of information is to ignore it. 


6.3.4. Summary: comparing imperfect and incomplete information. 
Recall that in a game of perfect information, each player knows the entire game tree 
and, whenever it is his turn, he knows the history of all previous moves (including 
any moves by nature). Thus, all information sets are of size one. 

In a game of imperfect information, each player knows the entire game tree 
(including the probabilities associated with any move by nature). A player also 
knows the information set he is in whenever it is his turn. However, there is at 
least one information set of size greater than one. 

In a game of incomplete information, players do not know the entire game tree 
or exactly which game they are playing. This is, in general, an intractable setting 
without further assumptions. 

One way to handle this is to extend the game tree by adding an initial move 
by nature, with a commonly known prior on this move. This approach converts the 
game of incomplete information to a Bayesian game, which is a game of complete 
but imperfect information. 


4 That is, played U in A? and D in A’. 
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6.4. Repeated games 


A special kind of extensive-form game arises when a regular one-round game 
of simultaneous moves is played repeatedly for some number of rounds. 

For example, recall Prisoner’s Dilemma) We saw that the unique Nash equi- 
librium, indeed dominant strategy, in this game is for both players to defect: 


player IT 
B cooperate (C) defect (D) 
= | cooperate (C) (6,6) (0,8) 
F defect (D) (8,0) (2,2) 
A 


What if the game is played n times? We assume that each player is trying to 
maximize the sum of his payoffs over the rounds. Both players’ actions in each 
round are revealed simultaneously, and they know the actions taken on previous 
rounds when deciding how to play in the current round. 

As in the one-shot game, in the final round, it will be a dominant strategy for 
each player to defect. Therefore, it is also a dominant strategy for each player to 
defect in the previous round, etc. Backward induction implies that the unique Nash 
equilibrium is to defect in each round. 

It is crucial for the analysis we just gave that the number of rounds of the game 
is common knowledge. But for very large n, this is not necessarily realistic. Rather, 
we would like to model the fact that the number of times the game will be played 
is not known in advance. 

One possibility is to let the game run forever and consider the limsup aver- 
age payoff: 

. i . 
lim sup — S (the player’s payoff in round t). (6.1) 
T—+00 T t=1 
(When the limit exists, we will refer to it as the average payoff.) 

We emphasize that this is very different from a limit of fixed horizon games. In 
the latter case, a player can select a strategy that depends on the horizon T, while 
if the goal is to maximize the (limiting) average payoff, then the player must select 
one strategy independently of T. 

Another way to assign utility to a player in an infinitely repeated game is to 
use a discount factor 8 < 1 and consider the discounted payoff: 


co 
5 6" (the player’s payoff in round t). (6.2) 
t=1 

There are two common interpretations for this: 

e For each t > 1, given that the game has lasted for t— 1 rounds, it continues 
to the next round with probability 8. Then the probability of still playing 
at time t is 8t and equation represents the player’s expected payoff. 

e A dollar earned today is better than a dollar earned next year since it can 
be enjoyed in the intervening year or invested to earn interest. 

The strategies we consider in repeated games are analogous to behavioral strate- 
gies in extensive-form games: 


5 This version differs from the one in Chapter [4] in that a constant has been added to all 
payoffs to make them nonnegative. 
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DEFINITION 6.4.1. Let G be a k-player normal-form game, where player i’s 
action set is A;. Let A := Tey A; be the set of action profiles. A (behavioral) 
strategy s; for player i in the infinitely repeated game G® is a mapping that 
for each t assigns to every possible history of actions Hj, € A‘! a mixed strategy 
8;(H,_1) for player i in G (i.e., a distribution over A;) to be played in round t. 


6.4.1. Repetition with discounting. Consider Iterated Prisoner’s Dilemmd?] 
with discount factor 8. 


DEFINITION 6.4.2. The Tit-for-Tat strategy in Iterated Prisoner’s Dilemma 
is the following: 
e Cooperate in round 1. 
e For every round k > 1, play what the opponent played in round k — 1. 
This strategy fares surprisingly well against a broad range of competing strate- 
gies. See the notes. 


LEMMA 6.4.3. For 8 > 1/3, it is a Nash equilibrium in Iterated Prisoner’s 
Dilemma for both players to play Tit-for-Tat. 


REMARK 6.4.4. The threshold of 1/3 for 6 depends on the specific payoff matrix 
used, but the principle applies more broadly. 


t #1 t 
PI l 
Tai c;c;c CIDIDIDIDICIC 
Payer | o|elce pip|lp/pilcjcle 
i = J 
i (6,6) (0, 8) (2,2) (8, 0) 
versus (6,6) (6,6) +++ (6,6), 


with no deviation 


FIGURE 6.14. Illustration of deviation in Tit-for-Tat strategies 


ProoF. If both players play Tit-for-Tat, then the payoff to each player in every 
round is 6. Suppose though that the first player plays Tit-for-Tat and the second 
player deviates. Consider the first round t at which he defects. Suppose first that he 
never switches back to cooperating. Then his payoff from t on is 8-8’+2)> got Ps 
versus 6) j> 8? if he had kept on cooperating. The latter is larger for 8 > 1/3. 

If he does switch back to cooperating at some round t’ > t then his payoff in 
rounds ¢ through t’ is 

8-p°+2 XO Ø, versus 6 X Ø 
t<j<t! t<j<t! 
if he doesn’t defect during this period. The latter is also greater when 8 > 1/3 
(and even for 8 slightly smaller). 

Applying this argument to each interval where player II defected proves the 

theorem. 


6 This is the game G, where G is the version of Prisoner’s Dilemma shown at the beginning 
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The following strategy constitutes a more extreme form of punishment for an 
opponent who doesn’t cooperate. 


DEFINITION 6.4.5. The Grim strategy in Iterated Prisoner’s Dilemma is the 
following: Cooperate until a round in which the other player defects, and then 
defect from that point on. 


EXERCISE 6.b. Determine for which values of @ it is a Nash equilibrium in 
Iterated Prisoner’s Dilemma for both players to use the Grim strategy. 


The previous exercise shows that (Grim, Grim) is a Nash equilibrium in Iterated 
Prisoner’s Dilemma if 8 is sufficiently large. In fact, (Grim, Tit-for-Tat) is also a 
Nash equilibrium. But these are far from the only equilibria. 

In the next section we characterize the payoffs achievable in a Nash equilibrium. 
To simplify the discussion, we consider average, rather than discounted, payofts. 


6.4.2. The Folk Theorem for average payoffs. Consider two infinite se- 
quences of actions ary and ay. One way for player II to try to force player I to stick 
with ay is to “punish” player I if she deviates from that sequence. In order for this 
threat to work, any gain from deviating must be outweighed by the punishment. 


rye! }o|/Dic|D\|c|D/c|D/c 


pies adm oa org la oom [er om Um alla en al (9-8 I 
it 2 2 


Payoff - 
vector (6, 6) (8,0) (6,6) (8, 0) 


average payoff (7, 3) 
FIGURE 6.15. (CD, Grim’) strategy pair without deviation. 


EXAMPLE 6.4.6. Consider the Cooperate-Defect (CD) strategy for player I 
in Iterated Prisoner’s Dilemma defined as follows: Alternate between cooperating 
and defecting as long as the other player cooperates. If the other player ever defects, 
defect from that point on. Let Grim’ be the player II strategy that cooperates as 
long as player I alternates between cooperate and defect, but if player I ever defects 
on an odd round, then player II defects henceforth. 

We claim that the strategy pair (CD, Grim’) is a Nash equilibrium that yields 
an average payoff of 7 to player I and 3 to player II. 

To see that these are the average payoffs, observe that if neither player deviates, 
then in alternate rounds, they play (cooperate, cooperate) yielding payoffs of (6, 
6), and (defect, cooperate), yielding payoffs of (8,0) for average payoffs of (7, 3). 

Figures [6.15] and [6.16] illustrate why this is a Nash equilibrium. 


The previous example gives a special case of one of the famous folk theorems, 
characterizing the payoffs achievable by Nash equilibria in repeated games. 
We will need two definitions: 


DEFINITION 6.4.7 (Payoff polytope). Let G be a finite k-person normal-form 
game, where player i's action set is A;. Let A := A, x Ag x--- x Ax, the set 
of action profiles, and u; : A — R the utility of player 7. For a € A, write 
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Player | Punishment: 


deviates = Payoff < 2 henceforth 
a 
Player I 
cD CIDCID D:| esac] ace 


eee Vie Coe ae | 1) Dee 
J 


(6, 6) (8, 0) 


Payoff 
vector 


FIGURE 6.16. (CD, Grim’) strategy pair with deviation. The figure shows 
that the Cooperate-Defect (CD) player’s average payoff drops by deviating. 
Similar analysis shows that Grim’ is a best response to CD. 


u(a) = (u,(a),...,up(a)) for the utility vector in G corresponding to the action 
profile a. The convex hull of {u(a) | a € A} is called the payoff polytope of the 
game. 


DEFINITION 6.4.8 (Individually-rational payoff profiles). Let G be a finite 
k-person game with action sets A;. We say a payoff vector g = (g1,..-,gk) is 
individually rational if each player’s payoff is at least his minmax value, the lowest 
payoff his opponents can limit him to. That is, for all 7 


gi > min max u; (ai, X_;). 
X—i Qi 


Note that the strategies x; in x_; could be randomized. (Recall that x; is a mixed 
strategy for player j in a single round of the game G.) 


Player | 


payer [CO] c| clo} c| cl chp] 1 
L y l ge 
Z 3 2 
12 12 12 
Payoffs: (6, 6) (0, 8) (8, 0) 


FIGURE 6.17. This figure shows the cycle that would be used in Iterated 
Prisoner’s Dilemma when the probability distribution is pc,c = K pc,D = L, 
and pp,c = L. In this case, the cycle is of length 12, and the average payoffs 
are 5 -64 ž -0+ 5 n9 = 43 for player I and 5s for player II. 


DEFINITION 6.4.9 (Nash equilibrium in a repeated game with average 
payoffs). A strategy profile s = (s1,...,8,) in the infinitely repeated game G” 
yields a payoff ut := u;(s(H;_1)) to player i in round t. (Notice that ut is a random 
variable since each s;(H;_1) is a mixed strategy; see Definition [6.4.1]) The profile 
s is a Nash equilibrium (average payoffs) if the following two conditions hold: 
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e The limit of payoffs exists; i.e., for each player j, with probability 1, 
1 co 
lim = 
Him pt 
t=1 
exists. 


e There is a vector (the average payoff vector) g = (g1,...,9%) such that 
for each player j and deviation s54, 


T 
1 
lim sup = 5 uj (s; (H-1), s-;(Ħe-1)) |. 


THEOREM 6.4.10 (The Folk Theorem for Average Payoffs:). Let G be a 
finite k-person game. 

(1) Ifs* = (sï,...,s%) is a Nash equilibrium (average payoffs) in the infinitely 
repeated game G® , then the resulting average payoff vector (g1, ..., gp) is 
in the payoff polytope and is individually rational. 

(2) If g = (g1,---, 9x) is individually rational and is in the payoff polytope, 
then there is a Nash equilibrium in G® for which the players obtain these 
average payoffs. 


6.4.3. Proof of [Theorem 6.4.10}. Part (i): First, since the payoff limit 
exists for s*, the strategies are in the payoff polytope. Second, if the strategies s* 
yield an average payoff g; to player i that is not individually rational, then player i 
has a better response. Specifically, for each round t, have her play a best response 
to whatever strategy s* prescribes for the other agents in round t given the history 
up to and including t — 1. By construction, this yields her at least her minmax 
utility in each round, showing that s* is not an equilibrium. 

Part (2): Let p be a probability distribution over action profiles for which 
g = >\,Pau(a). We first prove the theorem assuming that the entries in p are 
rational. Let D be a common denominator of the numbers in p. Construct a cycle 
of action tuples a = (a1, a@2,...,a,%) of length D consisting of D- pa occurrences of 
tuple a for each possible action profile. The equilibrium strategies are then defined 
as follows: Each player plays the strategy specified by the cycle just described. (See 
Figure|6.17] for an example.) If some player j ever deviates from this strategy, then 
from that point on, the rest of the players punish him by switching to the strategy 
that yields the minmax payoff to j. This is a Nash equilibrium because if player i 
deviates in any way, his payoff from the next round on is the minmax payoff which 
is at most g;. 


Next we show how to extend this to a Nash equilibrium for an irrational payoff 
vector g. Let g(1),g(2),... be a sequence of rational payoff vectors in the payoff 
polytope, that converges to g and satisfies ||g(j) — g|| > ||g(g7+1)—g|| for all j. Let 
Dj be the common denominator of the action tuple probabilities corresponding to 
g(j) (as in the previous paragraph). The Nash equilibrium we construct will have 
the following form: For j = 1,2,... play the strategy profile cycle achieving g(j) 
as described above for nj rounds, where nj; is selected so that 


njD; > 21 Dj41 + 5 nk Dk |. (6.3) 
k<j 
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We refer to these n; rounds as the j stage. By construction, the j'® stage lasts 
longer than 2/ times all earlier stages plus a single round of stage j + 1. 

We now argue that the limiting payoff the players obtain is g. Without loss 
of generality assume that ||u(a)|| < 1 for all a. Also, let a; be the action vector 
prescribed for step t of the game. Now suppose that at some time T, the players 
are in stage + 1; that is, 


£ 
0<T- 5 n;Di + mDe+1 < Deyi, 
j=1 


for some nonnegative integer m < ne+ı. Then by (6.3), the current stage £ + 1 
round plus all stages 1,...,— 1 last for no more than T2~* steps. Therefore, 


F 
|Y wa) - Tell < Tle - sl + r2, 
t=1 


and therefore as £ — oo, the average payoff vector converges to g. 

If a player ever deviates from the plan above, then from that point on, the rest 
of the players punish him so he receives his minmax payoff. Since g is individually 
rational, this strategy profile is a Nash equilibrium. 


Notes 


The notion of subgame-perfect equilibrium was formalized by Selten [Sel65]. The 
proof that every finite game of perfect information has a pure Nash equilibrium, indeed a 
subgame-perfect equilibrium, is due to Zermelo and Kuhn [Kuh53]. In the 
same paper, Kuhn proved [Theorem 6.2.8|showing that in games of perfect recall, 
every mixed strategy has a realization-equivalent behavior strategy. The Line-Item Veto 
game is from |DN08); it represents the conflict that arose in 1987 between Reagan and 
Congress, though with the preferences reversed from our example. The Large Company 
versus Startup game is from [Tv502]. The Centipede Game is due to Rosenthal [Ros81]. 
See for the results of behavioral experiments and related literature on the 
Centipede Game. 

The mathematical approach to the analysis of games of incomplete information was 
initiated by John Harsanyi and was the major factor in his 
winning of the 1994 Nobel Prize in Economics. The prize was shared with Nash and 
Selten “for their pioneering analysis of equilibria in the theory of non-cooperative games.” 

In we found equilibria in Bayesian games by reducing the game to normal 
form with payoffs averaged over the moves by nature. We know that such equilibria 
have realization-equivalent equilibria in behavioral strategies. These equilibria have the 
property that for each player, given the information he has on the move by nature, his 
strategy is a best response to the strategies of the other players. This is called a Bayesian 
equilibrium. The way Harsanyi made these ideas precise was by referring to the information 
a player has about the move by nature as his type|‘| Thus, we can think of the move by 
nature as assigning types to the different players. The interpretation of an equilibrium in 
behavioral strategies, in a game with moves by nature, as a Bayesian equilibrium is due 
to Harsanyi [Har67]. For more details, see Theorem 9.53]. 

Influential early works on signaling and asymmetric information were the book by 
Spence on signaling in economics and society and the paper of Zahavi |Zah75 
on the handicap principle that emphasized the role of costly signaling. For a broad in- 
troduction to signaling theory, see [Ber06]. A. Michael Spence won the 2001 Nobel Prize 


T We take this perspective in]Chapter 14 
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in Economics, together with George Akerlof and Joseph Stiglitz, “for their analyses of 
markets with asymmetric information.” 

Repeated games have been the subject of intense study. In a famous experiment 
(see [Axe84]), Axelrod asked people to send him computer programs that play 
Iterated Prisoner’s Dilemma and pitted them against each other. Tit-for-Tat, a four-line 
program sent by Anatol Rapoport, won the competition. 


+ 


k 
A 


Robert Aumann Thomas Schelling 


The Folk Theorem was known in the game theory community before it appeared in 
journals. A version of for discounted payoffs is also known. See [MSZ13]. 
Some relevant references are |Fri71 Aum81 [AS94]. For a broader look 
at the topic of repeated games, see [MSZ15]. 

In 2005, Robert Aumann won the Nobel Prize in Economics for his work on repeated 
games, and more generally “for having enhanced our understanding of conflict and coop- 
eration through game-theory analysis.” The prize was shared with Thomas Schelling. 

A theory of repeated games with incomplete information was initiated by Aumann 
and Maschler in the 1960s but only published in 1995 [AM95]. In particular, if the games 
in[§6.3.3]are repeated T times, then the gain to player I from knowing which game is being 
played is T/2 in the first example, but only o(T) in the second example. 

This chapter provides only a brief introduction to the subject of extensive-form games. 
The reader is encouraged to consult one of the many books that cover the topic in depth 
and analyze other equilibrium notions, e.g., Rasmusen [Ras07], Maschler, Solan, and Za- 
mir [MSZ13], and Fudenberg and Tirole [FT91]. 


Exercises 


6.1. Find all pure equilibria of the Centipede Game (Example 6.1.6). 
6.2. In the Fish Seller Game (Example 6.3.1), suppose that the seller only knows 


with probability 0.9 the true status of his fish (fresh or old). Draw the game 
tree for this Bayesian game and determine the normal-form representation 
of the game. 


S 6.3. Consider the zero-sum two-player game in which the game to be played is 
randomized by a fair coin toss. (This example was discussed in}§2.5.1}) If 
the toss comes up heads, the payoff matrix is given by A”, and if tails, it 
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is given by AT: 


player II player II 
Ho L R To L R 
a eros. a et eea e 
FİDI6 0 a|D|4 10 

T T 


For each of the settings below, draw the Bayesian game tree, convert 
to normal form, and find the value of the game. 
e Suppose that player I is told the result of the coin toss and both players 
play simultaneously. 
e Suppose that player I is told the result of the coin toss but she must 
reveal her move first. 


6.4. Kuhn Poker: Consider a simplified form of poker in which the deck has only 
three cards: a Jack, a Queen and a King (ranked from lowest to highest), 
and there are two players, I and II. The game proceeds as follows: 

e The game starts with each player anteing $1. 
e Each player is dealt one of the cards. 
e Player I can either pass (P) or bet (B) $1. 
— If player I bets, then player II can either fold (F) or call (C) 
(adding $1 to the pot). 
— if player I passes, then player II can pass ( P ) or bet $1 (B). 
* If player II raises, then player I can either fold or call. 
e If one of the players folds, the other player takes the pot. If neither 
folds, the player with the high card wins what’s in the pot. 
Find a Nash equilibrium in this game via reduction to the normal form. 
Observe that in this equilibrium, there is bluffing and overbidding. 


6.5. Consider an extensive-form game consisting of a series of sequential auc- 
tions for three different laptops. In round 1, there is an auction for laptop 
1, with participants A and B. In round 2, there is an auction for laptop 2, 
with participants C and D. In round 3, there is an auction for laptop 3, 
with participants B and C. Each auction is a second-price auction: Each 
participant submits a bid, and the person with the higher bid wins but 
pays the bid of the loser. Suppose also that each participant has a value 
for a laptop: Assume that v4 = 1, vg = 100, vo = 100, and vp = 99. 
The utility of a participant is 0 if he loses in all auctions he participates 
in, and it is his value for a laptop minus the sum of all payments he makes 
otherwise. 

A strategy for each player specifies, given the history, what bid that 
player submits in each auction he participates in. Show that there is an 
equilibrium in which players A, B, and C win. 
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CHAPTER 7 


Evolutionary and correlated equilibria 


7.1. Evolutionary game theory 


Biology has brought a kind of thuggish brutality to the refined intellectual world 
of game theory. — Alan Grafen 


Most of the games we have considered so far involve rational players optimiz- 
ing their strategies. A new perspective was proposed by John Maynard Smith and 
George Price in 1973: Each player could be an organism whose pure strategy is 
encoded in its gened] A strategy that yields higher payoffs enables greater repro- 
ductive success|?| Thus, genes coding for such strategies increase in frequency in 
the next generation. 

Interactions in the population are modeled by randomly selecting two indi- 
viduals, who then play a game. Thus, each player faces a mixed strategy with 
probabilities corresponding to population frequencies. 


We begin with an example, a variant of our old nemesis, the game of Chicken. 
7.1.1. Hawks and Doves. The game described in |Figure 7.1] is a simple 


model for two behaviors — one bellicose, the other pacifist — within the population 
of a single species. This game has the following payoff matrix: 


player II 
E H D 
S H (5376570) (v, 0) 
D (0, v) (3,5) 


Now imagine a large population, each of whose members are hardwired genetically 
either as hawks or as doves, and assume that those who do better at this game have 
more offspring. We will argue that if (x, 1 — x) is a symmetric Nash equilibrium in 
this game, then these will also be equilibrium proportions in the population. 


Let’s see what the Nash equilibria are. If c < 5, the game is a version of 


Prisoner’s Dilemma and (H, H) is the only equilibrium. When c > 5, there are 
two pure Nash equilibria: (H, D) and (D, H); and since the game is symmetric, 
there is a symmetric mixed Nash equilibrium. Suppose each player plays H with 
probability x € (0,1). For this to be player I’s strategy in a Nash equilibrium, the 
payoffs to player II from playing H and D must be equal: 


(L) z(=—c) +(1—2)u=(1-2)5 (R). (7.1) 


1A player may not be aware of his strategy. 
2 This is known as natural selection. 
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v/2 v/2 


v/2-c v/2-—c 


FIGURE 7.1. Two players play this game for a prize of value v > 0. They 
confront each other, and each chooses (simultaneously) to fight or to flee; 
these two strategies are called the “hawk” (H) and the “dove” (D) strategies, 
respectively. If they both choose to fight (two hawks), then each incurs a cost 
c, and the winner (either is equally likely) takes the prize. If a hawk faces a 
dove, the dove flees, and the hawk takes the prize. If two doves meet, they 
split the prize equally. 


For this to hold, we need x = 5, which by the assumption is less than 1. By 
symmetry, player II will do the same thing. 


Population dynamics for Hawks and Doves. Now suppose we have the fol- 
lowing dynamics in the population: Throughout their lives, random members of 
the population pair off and play Hawks and Doves; at the end of each generation, 
members reproduce in numbers proportional to their winnings. Let x denote the 
fraction of hawks in the population. 

If x < 3, then in equation (7-1), (L) > (R) — the expected payoff for a hawk 
is greater than that for a dove, and so in the next generation, x, the fraction of 
hawks, will increase. 

On the other hand, if x > 3, then (L) < (R) — the expected payoff for a dove 
is higher than that of a hawk, and so in the next generation, x will decrease. 


7.1.2. Evolutionarily stable strategies. Consider a symmetric, two-player 
game with n pure strategies each and payoff matrices A and B for players I and II, 
with Aij = Bj 4. 

We take the point of view that a symmetric mixed strategy in this game cor- 
responds to the proportions of each type within the population. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


7.1. EVOLUTIONARY GAME THEORY 139 


To motivate the formalism, suppose a population with strategy x is invaded by 
a small population of mutants of type z (that is, playing strategy z), so the new 
composition is ez + (1 — €)x, where £ is small. The new payoffs will be 


ex’ Az + (1—e)x’ Ax (for x’s), (7.2) 
ez’ Az+(1—e)z7 Ax (for z’s). (7.3) 


The criteria for x to be an evolutionary stable strategy will imply that, for 
small enough €, the average payoff for x’s will be strictly greater than that for z’s, 
so the invaders will disappear. Formally: 


DEFINITION 7.1.1. A mixed strategy x in A, is an evolutionarily stable 
strategy (ESS) if for any pure “mutant” strategy z: 
(a) zT Ax < xT Ax. 
(b) If zT Ax = xT Ax, then zT Az < xT Az. 


Observe that criterion (a) is equivalent to x being a (symmetric) Nash equilib- 
rium) Thus, if x is a Nash equilibrium, criterion (a) holds with equality for any z 
in the support of x. 

Assuming (a), no mutant will fare strictly better against the current population 
strategy x than x itself. However, a mutant strategy z could still successfully invade 
if it does just as well as x when playing against x and when playing against another 
z mutant. Criterion (b) excludes this possibility. 


EXAMPLE 7.1.2 (Hawks and Doves). We will verify that the mixed Nash 
equilibrium x = (4, 1- x) (i.e., H is played with probability =) is an ESS when 
c > 5. First, we observe that both pure strategies satisfy criterion (a) with equality, 
so we check (b). 

e Ifz= (1,0) (“H”), then zT Az = 4 — c, which is strictly less than xT Az = 
a(S 0) (1-— x)0. 
e If z = (0,1) (“D”), then 2? Az = $ < x? Az = xv + (1-2) 


Thus, the mixed Nash equilibrium for Hawks and Doves is an ESS. 


u 
z 


EXAMPLE 7.1.3 (Rock-Paper-Scissors). The unique Nash equilibrium in 


Rock-Paper-Scissors, x = (4, E, a) is not evolutionarily stable. 
player IT 
7 Rock Paper Scissors 
z Rock 0 —1 1 
F Paper 1 0 —1 
& | Scissors —1 1 0 


This is because the payoff of x against any strategy is 0, and the payoff of any 
pure strategy against itself is also 0, and thus, the expected payoff of x and z will 
be equal. This suggests that under appropriate notions of population dynamics, 
cycling will occur: A population with many Rocks will be taken over by Paper, 
which in turn will be invaded (bloodily, no doubt) by Scissors, and so forth. These 
dynamics have been observed in nature — in particular, in a California lizard] 


3 This is shorthand for (x, x) being a Nash equilibrium. 
4The description of this example follows, almost verbatim, the exposition of Gintis |Gin00}. 
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The side-blotched lizard Uta stansburiana has three distinct types of male: 
orange-throat, blue-throat, and yellow-striped. All females of the species are yellow- 
striped. The orange-throated males are violently aggressive, keep large harems 
of females, and defend large territories. The yellow-striped males are docile and 
look like receptive females. In fact, the orange-throats can’t distinguish between 
the yellow-striped males and females. This enables the yellow-striped males to 
sneak into their territory and secretly copulate with the females. The blue-throats 
are less aggressive than the orange-throats, keep smaller harems (small enough to 
distinguish their females from yellow-striped males), and defend small territories. 

Researchers have observed a six-year cycle starting with domination, say, by 
the orange-throats. Eventually, the orange-throats amass territories and harems 
so large that they can no longer be guarded effectively against the sneaky yellow- 
striped males, who are able to secure a majority of copulations and produce the 
largest number of offspring. When the yellow-striped lizards become very com- 
mon, however, the males of the blue-throated variety get an edge: Since they have 
small harems, they can detect yellow-striped males and prevent them from invad- 
ing their harems. Thus, a period when the blue-throats become dominant follows. 
However, the aggressive orange-throats do comparatively well against blue-throats 
since they can challenge them and acquire their harems and territories, thus prop- 
agating themselves. In this manner, the population frequencies eventually return 
to the original ones, and the cycle begins anew. 

When John Maynard Smith learned that Uta stansburia were “playing” Rock- 
Paper-Scissors, he reportedly] exclaimed, “They have read my book!” 


FIGURE 7.2. The three types of male lizard Uta stansburiana. Picture cour- 


tesy of Barry Sinervo; see http://bio.research.ucsc.edu/~barrylab 


5 This story is reported in Sig05}. 
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EXAMPLE 7.1.4 (Unstable mixed Nash equilibrium). In this game, 


player II 
_ A B 
s | A | (10,10) (0,0) 
F B| (0,0) (5,5) 
a 


both pure strategies (A, A) and (B, B) are evolutionarily stable, while the symmet- 
ric mixed Nash equilibrium x = (4, 2) is not. 

Although (B, B) is evolutionarily stable, if a sufficiently large population of A’s 
invades, then the “stable” population will in fact shift to being entirely composed 
of A’s. Specifically, if, after the A’s invade, the new composition is a fraction A’s 
and 1 — a fraction B’s, then using (7.2), the payoffs for each type are 

5(1—a) (for B’s) 
10a (for A’s). 
Thus if a > 1/3, the payoffs of the A’s will be higher and they will “take over”. 


EXERCISE 7.a (Mixed population invasion). Consider the following game: 


player II 
A B C 
s [al 0,0 6,2 (-1,-1 
SIBI (2,6) (0,0) (3,9) 
|} C|(-1,-1) (9,3) (0,0) 


Find two mixed Nash equilibria, one supported on {A, B}, the other supported on 
{B,C}. Show that they are both ESS, but the {A,B} equilibrium is not stable 
when invaded by an arbitrarily small population composed of half B’s and half C’s. 


EXAMPLE 7.1.5 (Sex ratios). Evolutionary stability can be used to explain sex 
ratios in nature. In mostly monogomous species, it seems natural that the birth rate 
of males and females should be roughly equal. But what about sea lions, in which 
a single male gathers a large harem of females, while many males never reproduce? 
Game theory helps explain why reproducing at a 1:1 ratio remains stable. To 
illustrate this, consider the following highly simplified model. Suppose that each 
harem consists of one male and ten females. If M is the number of males in the 
population and F the number of females, then the number of “lucky” males, that 
is, males with a harem, is Mz = min(M, F'/10). Suppose also that each mating pair 
has b offspring on average. A random male has a harem with probability M,/M, 
and if he does, he has 10b offspring on average. Thus, the expected number of 
offspring a random male has is E [Cm] = 10bMz,/M = bmin(10, F/M). On the 
other hand, the number of females that belong to a harem is Fg = min(F,10M), 
and thus the expected number of offspring a female has is E [Cy] = bF,/F = 
bmin(1,10M/F). 

If M < F, then E [Cm] > E [Cy], and individuals with a higher propensity to 
have male offspring than females will tend to have more grandchildren, resulting in 
a higher proportion of genes in the population with a propensity for male offspring. 
In other words, the relative birthrate of males will increase. On the other hand, if 
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~ 


FIGURE 7.3. Sea lion life. 


M > F, then E [Cm] < E [Cy], and the relative birthrate of females increases. (Of 
course, when M = F, we have E [Cm] = E [C+], and the sex ratio is stable.) 


7.2. Correlated equilibria 


If there is intelligent life on other planets, in a majority of them, they would 
have discovered correlated equilibrium before Nash equilibrium. |— Roger Myerson 


EXAMPLE 7.2.1 (Battle of the Sexes). The wife wants to head to the opera, 
but the husband yearns instead to spend an evening watching baseball. Neither is 
satisfied by an evening without the other. In numbers, player I being the wife and 
player IT the husband, here is the scenario: 

husband 
opera baseball 
g opera | (4,1) (0,0) 
Z | baseball | (0,0) (1,4) 


In this game, there are two pure Nash equilibria: Both go to the opera or both 
watch baseball. There is also a mixed Nash equilibrium which yields each player 
an expected payoff of 4/5 (when the wife plays (4/5,1/5) and the husband plays 
(1/5,4/5)). This mixed equilibrium hardly seems rational: The payoff a player gets 
is lower than what he or she would obtain by going along with his or her spouse’s 
preference. How might this couple decide between the two pure Nash equilibria? 

One way to do this would be to pick a joint action based on a flip of a single 
coin. For example, the two players could agree that if the coin lands heads, then 
both go to the opera; otherwise, both watch baseball. Observe that even after the 
coin toss, neither player has an incentive to unilaterally deviate from the agreement. 


To motivate the concept of correlated equilibrium introduced below, observe 
that a mixed strategy pair in a two-player general-sum game with action spaces [m] 
and [n] can be described by a random pair of actions: R with distribution x € Am 
and C with distribution y € A,,, picked independently by players I and II. Thus, 
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It follows from |Lemma 4.3.7|that x, y is a Nash equilibrium if and only if 
P [R = i] >0 = E [aie 2E [acc] 


for all ¢ and £ in [n] and 


for all j and k in [m]. 
Player II 
Gove G e Mn oO 


P, f f Zag te : Zij eee Zin 


Player I 


Pin : i 2a 


FIGURE 7.4. This figure illustrates the difference between a Nash equilibrium 
and a correlated equilibrium. In a Nash equilibrium, the probability that 
player I plays i and player II plays j is the product of the two correspond- 
ing probabilities (in this case p:q;), whereas a correlated equilibrium puts a 
probability, say zij, on each pair (i, j) of strategies. 


DEFINITION 7.2.2. A correlated strategy pair is a pair of random actions 
(R,C) with an arbitrary joint distribution 


The next definition formalizes the idea that, in a correlated equilibrium, if 
player I knows that the players’ actions (R,C) are picked according to the joint 


distribution z and player I is informed only that R = i, then she has no incentive 
to switch to some other action £. 


DEFINITION 7.2.3. A correlated strategy pair in a two-player game with payoff 
matrices A and B is a correlated equilibrium if 


P [R = i] > 0 = E [aice | R= i] > E [aze |R = i] (7.4) 
for all i and £ in [n] and 
P [C = j] > 0 = E þbr,; | C = j] > E [brx | € = J] 
for all j and k in [m]. 

REMARK 7.2.4. In terms of the distribution z, the inequality in condition 


2 (<4) ie > (<4) 


J 
Thus, z is a correlated equilibrium iff for all i and £, 


> ai; > > Zig ej 
j j 


is 
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Player II strategy conditioned on i 


a 


2: 2: A 
il i2 in é 
Lja j-i Yj- J 
Zij : : 
ie Ži 
Zy 
Player I = 
yee 2 
strategy es 
E Oe OPE are Tee conditioned 
re on j 
mj 
ae Zij 


FIGURE 7.5. The left figure shows the distribution player I faces (the labels 
on the columns) when the correlated equilibrium indicates that she should 


play i. Given this distribution over columns, [Definition 7.2.3| says that she 
has no incentive to switch to a different row strategy. The right figure shows 
the distribution player II faces when told to play j. 


and for all 7 and k, 
5 Zijbij 2 5 Zijbik- 
i i 
The next example illustrates a more sophisticated correlated equilibrium that 
is not simply a mixture of Nash equilibria. 


EXAMPLE 7.2.5 (Chicken, revisited). In this game, (S,D) and (D,S) are 
Nash equilibria with payoffs of (2,7) and (7,2), respectively. There is also a mixed 


Nash equilibrium in which each player plays S with probability 2/3 and D with 
probability 1/3 resulting in an expected payoff of 42. 


player II 
a Swerve (S) Drive (D) 
x | Swerve (S) (6, 6) (2, 7) 
& | Drive (D) (7, 2) (0, 0) 
er 


The following probability distribution z is a correlated equilibrium which results 
in an expected payoff of 45 to each player, worse than the mixed Nash equilibrium: 


A more interesting correlated equilibrium, that yields a payoff outside the convex 


player IT 
= Swerve (S) Drive (D) 
x | Swerve(S) 0 1/2 
& | Drive (D) 1/2 0 
A 


hull of the Nash equilibrium payoffs, is the following: 
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= Swerve (S) Drive (D) 
z | Swerve (S) 1/3 1/3 
® | Drive (D) 1/3 0 
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For this correlated equilibrium, it is crucial that the row player only knows R and 
the column player only knows C. Otherwise, in the case that the outcome is (C, C), 
each player would have an incentive to deviate (unilaterally). 

Thus, to implement a correlated equilibrium, an external mediator is typically 
needed. Here, the external mediator chooses the pair of actions (R,C) according to 
this distribution ((S,D),(D,S),(S,S)) with probability 4 each) and then discloses to 
each player which action he or she should take (but not the action of the opponent). 
At this point, each player is free to follow or to reject the suggested action. It is in 
their best interest to follow the mediator’s suggestion, and thus this distribution is 
a correlated equilibrium. 

To see this, suppose the mediator tells player I to play D. In this case, she 
knows that player II was told to play S and player I does best by complying to 
collect the payoff of 7. She has no incentive to deviate. 

On the other hand, if the mediator tells her to play S, she is uncertain about 
what player II was told, but conditioned on what she is told, she knows that (S,S) 
and (S,D) are equally likely. If she follows the mediator’s suggestion and plays S, 

1 


her payoff will be 6 x 4 +2 x 5 =4, while her expected payoff from switching is 


7x 4 = 3.5, so player I is better off following the suggestion. 

We emphasize that the random actions (R,C) used in this correlated equilib- 
rium are dependent, so this is not a Nash equilibrium. Moreover, the expected 
payoff to each player when both follow the suggestion is 2 x j +6 x 3 +7 x 3 =5. 
This is better than what they would obtain from the symmetric Nash equilibrium 


or from averaging the two asymmetric Nash equilibria. 


Notes 


The Alan Grafen quote at the beginning of the chapter is from [HH07]. Evolutionary 
stable strategies were introduced by John Maynard Smith and George Price 
(though Nash in his thesis already discussed the interpretation of a mixed strat- 
egy in terms of population frequencies). For a detailed account of how game theory affected 
evolutionary biology, see the classic book by John Maynard Smith [Smi82]. The concept 
has found application in a number of fields including biology, ecology, psychology, and 
political science. For more information on evolutionary game theory, see Chapters 6, 11, 
and 13 in [YZ15]. The study of evolutionary stable strategies has led to the development 
of evolutionary game dynamics, usually studied via systems of differential equations. See, 


e.g., |[HS98}. 


n Ah 


John Maynard Smith 
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The description of cycling in the frequencies of different types of Uta stansburi- 
ana males and the connection to Rock-Paper-Scissors is due to B. Sinervo and C. M. 
Lively [SL96]. 

The notion of correlated equilibrium was introduced in 1974 by Robert Aumann 
[Aum87]. The fact that every finite game has at least one Nash equilibrium 
implies that every finite game has a correlated equilibrium. Hart and Schmeidler 
provide an elementary direct proof (via the minimax theorem) of the existence of correlated 
equilibria in games with finitely many players and strategies. Note that while we have 
only defined correlated equilibrium here in two-player games, the notion extends naturally 
to more players. See, e.g., Chapter 8 of [MSZ13}. 

Surprisingly, finding a correlated equilibrium in large scale problems is computation- 
ally easier than finding a Nash equilibrium. In fact, there are no computationally efficient 
algorithms known for finding Nash equilibria, even in two-player games. However, cor- 
related equilibria can be computed via linear programming. (See, e.g., for an 
introduction to linear programming.) For a discussion of the complexity of computing 
Nash equilibria and correlated equilibria, see the survey by Papadimitriou in [YZ15]. 


Exercises 
7.1. Find all Nash equilibria and determine which of the symmetric equilibria 
are evolutionarily stable in the following games: 
player IT player IT 
= A B and a A B 
5| A] (4,4) (2,5) 5 | A (4,4) (3,2) 
Ẹ |B (5,2) (3,3) Ẹ |B | (2,3) (5,5) 
A A 


S 7.2. Consider the following symmetric game as played by two drivers, both 
trying to get from Here to There (or two computers routing messages along 
cables of different bandwidths). There are two routes from Here to There; 
one is wider and therefore faster, but congestion will slow them down if 
both take the same route. Denote the wide route W and the narrower 
route N. The payoff matrix is 


paotr N 


Sam 


Payoffs: Payoffs: 


54 


FIGURE 7.6. The leftmost image shows the payoffs when both drivers drive 
on the narrower route, the middle image shows the payoffs when both drivers 
drive on the wider route, and the rightmost image shows what happens when 
the red driver (player I) chooses the wide route and the yellow driver (player 
II) chooses the narrow route. 
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player IT (yellow) 

W N 
) (5,4) 
) (2,2) 


W 
N 


,3 
5 


(3 
(4 


:) 


player I (red) 


Find all Nash equilibria and determine which ones are evolutionarily stable. 


7.3. Argue that in a symmetric game, if aj; > bij (= aji) for all j Æ i, then 


pure strategy 7 is an evolutionarily stable strategy. 


7.4. Occasionally, two parties resolve a dispute (pick a “winner” ) by playing a 


variant of Rock-Paper-Scissors. In this version, the parties are penalized if 

there is a delay before a winner is declared; a delay occurs when both players 

choose the same strategy. The resulting payoff matrix is the following: 

player IT 

Rock Paper Scissors 
Rock | (—1,—1) (0,1) (1, 0) 
Paper | (1,0) (-1,-1) (0,1) 

Scissors | (0, 1) (1,0)  (—1,-1) 


player I 


Show that this game has a unique Nash equilibrium that is fully mixed, 
and results in expected payoffs of 0 to both players. Then show that the 
following probability distribution is a correlated equilibrium in which the 
players obtain expected payoffs of 1/2: 


player IT 
Rock Paper Scissors 


a Rock | 0 1/6 1/6 
= | Paper | 1/6 0 1/6 
‘a, | Scissors | 1/6 1/6 0 
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CHAPTER 8 


The price of anarchy 


In this chapter, we study the price of anarchy, the worst-case ratio between 
the quality of a socially optimal outcome and the quality of a Nash equilibrium 
outcome. 


8.1. Selfish routing 


On Earth Day in 1990, the New York City traffic commissioner made the de- 
cision to close 42nd Street, one of the most congested streets in Manhattan. Many 
observers predicted that disastrous traffic conditions would ensue. Surprisingly, 
however, overall traffic and typical travel times actually improved. As we shall see 
next, phenomena like this, where reducing the capacity of a road network actually 
improves travel times, can be partially explained with game theory. 


a © SS ip © Ye 
P s 
1 unit of traffic = 1 unit of traffic 1 unit of traffic 2 x 1 unit of traffic 
i flows in flows out flows in 1 flows out 
T 
AG A D y A ~ 


(0) Loe latency 1 D) 
no congestion DA S latency depends 


latency always 1 linearly on congestion 


FIGURE 8.1. Each link in the left figure is labeled with a latency function L(x) 
which describes the travel time on an edge as a function of the congestion x 
on that edge. (The congestion x is the fraction of traffic going from A to B 
that takes this edge.) In Nash equilibrium, each driver chooses the route that 
minimizes his own travel time, given the routes chosen by the other drivers. 
The unique Nash equilibrium in this network, shown on the right, is obtained 
by sending half the traffic to the top and half to the bottom. Thus, the 
latency each driver experiences is 3/2. This is also the optimal routing; i.e., 
it minimizes the average latency experienced by the drivers. 


EXAMPLE 8.1.1 (Braess Paradox). A large number of drivers head from point 
A to point B each morning. There are two routes, through C and through D. The 
travel time on each road may depend on the traffic on it as shown in [Figure 8.1] 
Each driver, knowing the traffic on each route, will choose his own path selfishly, 
that is, to minimize his own travel time, given what everyone else is doing. In this 
example, in equilibrium, exactly half of the traffic will go on each route, yielding 
an average travel time of 3/2. This setting, where the proportion of drivers taking 
a route can have any value in [0, 1], is called “infinitesimal drivers”. 
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Now consider what can happen when a new, very fast highway is added between 
C and D. Indeed, we will assume that this new route is so fast that we can simply 
think of the travel time on it as being 0. 


1 unit of traffic 0 \ 1 unit of traffic 
flows in flows out 
Na 


latency 1 1 © 

A latency > 5 latency 1 

1 unit of traffic latency 1 unit of traffic 1 unit of traffic 2 1 unit of traffic 
flows in flows out flows in n flows out 


0 
0) latency 1 latency 1 latency 4 


Nash equilibrium flow optimal flow 


FIGURE 8.2. The Braess Paradox: Each link in the top figure is labeled with 
a latency function (x) which describes the travel time on that edge as a 
function of the fraction x of traffic using that edge. These figures show the 
effect of adding a 0 latency road from C to D: The travel time on each of 
yo = A—C-—B and yp = A— D — B is always at least the travel time on the 
new route y = A— C — D — B. Moreover, if a positive fraction of the traffic 
takes route yc (resp. yp), then the travel time on y is strictly lower than 
that of yc (resp. yp). Thus, the unique Nash equilibrium is for all the traffic 
to go on the path y, as shown in the bottom left figure. In this equilibrium, 
the average travel time the drivers experience is 2, as shown on the bottom 
left. On the other hand, if the drivers could be forced to choose routes that 
would minimize the average travel time, it would be reduced to 3/2, the social 
optimum, as shown on the bottom right. 


One would think that adding a fast road could never slow down traffic, but 
surprisingly in this case it does: As shown in average travel time in 
equilibrium increases from 3/2 to 2. This phenomenon, where capacity is added 
to a system and, in equilibrium, average driver travel time increases, is called the 
Braess Paradox. 


We define the socially optimal traffic flow to be the partition of traffic 
that minimizes average latency. The crux of the Braess Paradox is that while the 
social optimum can only improve when roads are added to the network, the Nash 
equilibrium can get worse. 

We use the term price of anarchy to measure the ratio between performance 
in equilibrium and the social optimum. In Example [8.1.1] 

average travel time in worst Nash equilibrium 2 4 


rice of anarchy := = =>. 
E y average travel time in socially optimal outcome 3/2 3 


In fact, we will see that in any road network with affine latency functions and 
infinitesimal drivers, as in this example, the price of anarchy is at most 4/3! We will 
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develop this result soon, but first, let’s calculate the price of anarchy in a couple of 
simple scenarios. 


EXAMPLE 8.1.2 (Pigou-type examples). In the example of with 
latency functions ax and bz (a and b are both constants greater than 0), the Nash 
equilibrium and optimal flow are the same: the solution to az = b(1 — x). Thus, 
the price of anarchy is 1. 


latency with 


x, units of flow 
ab 
latency 


ax a 


bx 


2 latenc 
Yatb 
latency with 


x, units of flow 


FIGURE 8.3. The figure on the right shows the equilibrium and optimal flows 
(which are identical in this case). Both have a fraction b/(a + b) of the traffic 
on the upper link, resulting in the same latency of ab/(a+ b) on both the top 
and bottom links. Thus, the price of anarchy in this network is 1. 


latency with 


on x units of flow 


1 
NE constant latency 


(independent of flow) 


latency 1 latency 1 


latency 1 


Nash equilibrium flow optimal flow 


FIGURE 8.4. The top figure shows the latency functions on each edge. The 
bottom left figure shows the Nash equilibrium flow, which has an average 
latency of 1. The bottom right shows the optimal flow, which has an average 
latency of 3/4. 


On the other hand, as shown in |Figure 8.4| with latency functions of x and 
1, the Nash equilibrium sends all traffic to the top link, whereas the optimal flow 
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sends half the traffic to the top and half to the the bottom, for a price of anarchy 
of 4/3. 


8.1.1. Bounding the price of anarchy. Consider an arbitrary road net- 
work, in which one unit of traffic flows from source s to destination t. Let Pst be 
the set of paths in G from s to t. Each driver chooses a path P € Pa. Let fp 
be the fraction of drivers that take path P. Write f := (fp) pep. for the resulting 
traffic flow. The space of possible flows is 


A(Pa) = ff :fp>0VP and Y` fp= ue 


PEP st 


Given such a flow, the induced (traffic) flow on an edge e is 


F, := 5 TP (8.1) 


PleeP 


edgee 


Fe=fet+fet+fe (Fe) = €e(fe+fre+ fe) 


FIGURE 8.5. This figure shows the relationship between the edge flow Fe and 
the path flows that contribute to it (in this example fe, fp, and fc) and 
depicts the computation of Ze(Fe). The contribution of edge e to L(f) = 
op fpLp(Ff) is precisely Fele(Fe). See|(8.3) 


Denote by le(x), the latency on edge e as a function of x, the amount of traffic 
on the edge. Throughout this section, we assume that latency functions are weakly 
increasing and continuous. Notice that each driver that chooses path P experiences 
the same latency: 

Lp(f) = X le(Fe). (8.2) 
eEP 
We denote by L(f) the total latency of all the traffic. Since a fraction fp of the 
traffic has latency Lp(f), this total latency is 


L(f) = X frLp(f). 
P 


In equilibrium, each driver will choose some lowest latency path with respect to the 
current choices of other drivers: 
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DEFINITION 8.1.3. A flow f is a (Nash) equilibrium flow if and only if 
whenever fp > 0, the path P is a minimum latency path; that is 


=> 28 = Anin Tet): 


e€P 


REMARK 8.1.4. In|§8.1.3] we show that equilibrium flows exist. 


REMARK 8.1.5. We can equivalently calculate the latency L(f) from the edge 


flows Fe. (See 51) Since the latency experienced by the flow F, across edge 
e is (.(F.), we can waite L(f) in two ways: 


f) =) feLp(f) =) ) Fele(Fe). (8.3) 
P e 


The next proposition generalizes this equation to the setting where the edge 
latencies are determined by one flow (f) and the routing is specified by a different 


flow (f). 


PROPOSITION 8.1.6. Let f and f be two path flows with corresponding edge flows 
{Fu}ecr and {Fe}cer, respectively. Then 


X feLp(f) = XO Fele(Fe). (8.4) 
P e 


PROOF. We have 


2 fell =) 7 PD bl Fe) = ) lel.) 5 fe =J Fetel(F 


P EP PleeP 


using|(8.2)|for the first equality and|(8.1)|for the last. 


The next lemma asserts that if the edge latencies are determined by an equi- 
librium flow and are fized at these values, then any other flow has weakly higher 
latency. 


LEMMA 8.1.7. Let f be an equilibrium flow and let f be any other path flow, 
with corresponding edge flows {Fe}een and {Fe seer, respectively. Then 


> (- FeMle(Fe) 2 0. (8.5) 


e 


PROOF. Let L = minprep,, Lp (£). By [Definition 8.1.3| if fp > 0, then 


Lp(f) = L. Since Xp fp = $p fp = 1, it follows that 


X` feLp(f)=L and X` fpLp(f) > L. (8.6) 
P P 


We combine these using |(8.4)|and (8.3) to get 


2 FebelF e) 2D Fabel F 
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8.1.2. Affine latency functions. 


THEOREM 8.1.8. Let G be a network where one unit of traffic is routed from 
a source s to a destination t. Suppose that the latency function on each edge e is 
affine; that is, €.(~) = aex + be, for constants ae,be > 0. Let f be an equilibrium 
flow in this network and let f* be an optimal flow; that is, 


L(f*) = min{L(f) : f € A(P.:)}. 
Then the price of anarchy is at most 4/3; i.e., 
4 
L(f) < z Lif"). 


REMARK 8.1.9. When the latency functions are linear (i.e., when be = 0 for all 
links e), the price of anarchy is 1. See 


PROOF. Let {F2}-ex be the set of edge flows corresponding to the optimal 
(overall latency minimizing) path flow f*. By|Lemma 8.1.7 


L(f) = X. Fe(aeFe + be) < XO Fi (aeFe + be). (8.7) 


Thus, 

L(f) — L(f*) < XO Fide(Fe — F$). 
Using the inequality r(y— x) < y?/4 (which follows from (x—y/2)? > 0), we deduce 
that 


L(£) — L(£*) < jake < 


A corollary of this theorem is that the Braess Paradox example we saw earlier 
is extremal. 


COROLLARY 8.1.10. If additional roads are added to a road network with affine 
latency functions, the latency at Nash equilibrium can increase by at most a factor 


of 4/3. 


PROOF. Let G be a road network and H an augmented version of G. Let fa 
denote an equilibrium flow in G and fğ an optimal flow in G. Similarly for H. 
Clearly L(fj,) < L(fé). It follows that 

4 4 


L fin) < SUF) < SEMS) < SLf). 


8.1.3. Existence of equilibrium flows. 


LEMMA 8.1.11. Consider an s-t road network, where the latency function on 
edge e is €.(-). If all latency functions are nonnegative, weakly increasing and 
continuous, then a Nash equilibrium exists. 


REMARK 8.1.12. This game is a continuous analogue of the congestion games 


discussed in 


PROOF. The set of possible path flows is the simplex A(P.;) of distributions 
over paths. The mapping |(8.1)} from path flows to edge flows is continuous and 
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linear, so the set K of all possible edge flows is compact and convex. Define a 
potential function over edge flows: 


Fe 
(F) := 5 le(x)dz. 


Note that ® is a convex function since the latency functions ¢,.(-) are weakly in- 
creasing. 

We will show that an edge flow F that minimizes ® is a Nash equilibrium flow. 
To see this, let f be any path flow corresponding to edge flow F. (Note that this 
path flow is not necessarily unique.) Let P* be any path of minimum latency under 
F, and let L* be the latency on this path. If P is a path of latency L > L*, we 
claim that fp = 0. If not, then the flow obtained by moving ô units of flow from P 
to P* has a lower value of ®: Doing this changes ® by 


F.+6 Fe 
Ao= ` i be(a)dx—- X` | le(x)dx 
ee P*\ p” Fe ec P\p* Y Fe—ð 


=6 XO &(Fe)-5 XO be(Fe) + 0(5) 
e€P*\P e€P\P* 
= ô(L* — L) + o(ô), 


which is negative for ô sufficiently small. 


8.1.4. Beyond affine latency functions. Let £ be a class of latency func- 
tions. In|§8.1.2] we showed that if 
L = {ax + bja, b > 0}, 
then the price of anarchy is 4/3, which is achieved in the Pigou network shown in 
Figure 8.4| When the set of latency functions is expanded, e.g., 


L' = faz? + bz + cla, b,c > 0}, 


the price of anarchy is worse, as shown in |Figure 8.6| However, as we shall see 
next, the price of anarchy for any class of latency functions and any network is 


maximized in a Pigou network (Figure 8.7)! 


latency with 


ue units of flow 


1 unit of 1 unit of 
flow in S< aAa flow out 


1 


<. constant latency 


(independent of flow) 


FIGURE 8.6. With the given latency functions, the optimal flow routes x units 
of flow on the upper link (and thus 1 — x on the lower link) so as to minimize 
the average latency, which is 2-2“ +(1— x). The Nash equilibrium flow routes 
all the flow on the upper link. The resulting price of anarchy is approximately 
1.6 for d = 2, approximately 1.9 for d = 3, and is asymptotic to d/Ind as d 
tends to infinity. 
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Suppose that there are r units of flow from s to t. If (x) is the latency function 
on the upper link with x units of flow and if it is strictly increasing, then (r) is the 
smallest constant latency that can be assigned to the bottom edge that induces an 
equilibrium flow using only the top edge. 


latency with 


ye units of flow 


r units of r units of 
flow in flow out 
£(r) 


bo constant latency 


(independent of flow) 


FIGURE 8.7. Here we are assuming that there are r units of flow from s to 
t. The Pigou price of anarchy a,(@) is the price of anarchy in this network. 
Since latency functions are weakly increasing, (r) > (x) for any 0< z <r 
and thus in the worst Nash equilibrium, all flow is on the top edge. (There 
may be other Nash equilibria as well.) 


DEFINITION 8.1.13. Let a,(£) be the Pigou price of anarchy for latency 
function £(-) in the network shown in|Figure 8.7| when the total flow from s to t is 
r; i.e., 

re(r) 
on) = amosse fa) + —2) 20] oe 

REMARK 8.1.14. It will be useful below to note that the minimum in the 
denominator is unchanged if you take it over x > 0 since (L(x) — £(r)) > 0 for 
T >T: 


THEOREM 8.1.15. Let L be a class of latency functions, and define 


A,(£) := ee ou arll). 


Let G be a network with latency functions in L and total flow r from s to t. Then 
the price of anarchy in G is at most A, (L). 


PROOF. Let f and f* be an equilibrium flow and an optimal flow in G, with 
corresponding edge flows F and F*. Fix an edge e in G and consider a Pigou 


network (as in|Figure 8.7) with (x) := ¢.(x) and total flow r := Fe. Then, by 
(8.8)| 


F,-£.(F.) 
mino<e<F, |T : le(x) + (Fe — x) - le(Fe)] 
> Fe j Le (Fe) 
E He -Lel FE) + (Fe — FS) - te(Fe)’ 
where the final inequality follows from [Remark 8.1.14] Rearranging, we obtain 


Fitl.(Ft) > ——~ Fe - le(Fe) + (Ft — Fe) - le(Fe), 
ap, (Ce) 


OF, (Le) = 
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and summing over e yields 
1 
L(f*) > ——L(f) + Fe — Fe) < le(Fe). 
(= go Ore )- le(Fe) 


Observe that |Lemma 8.1.7] also applies to the case where the total flow is r. (In 


(8.6)| replace L by rL.) Thus, applying|(8.5)|to the second sum yields 


1 
1o 


8.1.5. A traffic-anarchy tradeoff. The next result shows that the effect of 
anarchy cannot be worse than doubling the total amount of traffic. 


L(f*) > 


THEOREM 8.1.16. Let G be a road network with a specified source s and sink t 
where r units of traffic are routed from s to t, and let f be a corresponding equilib- 
rium flow with total latency L(£). Let f* be an optimal flow when 2r units of traffic 
are routed in the same network, resulting in total latency L(£*). Then 


L(f) < L(£*). 
PROOF. Suppose that all the paths in use in the equilibrium flow f have latency 
L. As in the proof of we have that 
L(f)= 50 feLp(f)=rL and X` fplp(f) > 2rL. (8.9) 
P P 


We will show that 
XO fpLp(£) < L(f) + L(£*) (8.10) 
P 


which together with|(8.9)|completes the proof. 
We first rewrite|(8.10)|in terms of edges using |(8.4) 


NO Fibe(Fe) < So Fiel FZ) + >> Fele(Fe). (8.11) 


We claim that this inequality holds for each edge; i.e., 
Fee.) — 2) < Fee). (8.12) 


To verify this, consider separately the case where F% > F. and F% < Fe, and use 
the fact that ¢.(-) is increasing. 


REMARK 8.1.17. An alternative interpretation of this result is that doubling 
the capacity of every link can compensate for the lack of central control. See 


8.2. Network formation games 


Consider a set of companies jointly constructing a communication network. 
Each company needs to connect a source to a sink. The cost of a link used by 
several companies is shared equally between them. How does this cost sharing rule 
affect their choices? 

We model this via the following fair network formation game: There is a 
directed graph G whose edges F represent links that can be constructed. Associated 
with each link e is a cost ce. There are k players; player i chooses a path P; in 
G from node s; to node t; and pays his fair share of the cost of each link in path 
P;. Thus, if link e is on the paths selected by r players, then each of them must 
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O 


O 


FIGURE 8.8. In this example, there are two Nash equilbria. It is a Nash 
equilibrium for all players to choose the upper path, resulting in a cost of 
(1+ €)/k to each player. However, there is also a bad Nash equilibrium: If all 
players choose the lower path, then no player has an incentive to deviate. In 
this case, each player’s cost is 1, whereas if he switched to the upper path, his 
cost would be 1 +e. 


pay c./r to construct that link. The goal of each player is to minimize his total 
payment. 


REMARK 8.2.1. A fair network formation game is a potential game via the 
potential function 


ne(s) 
ds)= YS. 
e€E(s) j=1 J 


(See Section [4.4}) 


In the example shown in Figure 8.8} there are k players, each requiring the use 
of a path from a source s to a destination t. There is a bad Nash equilibrium in 
which the cost of constructing the network is approximately k times as large as 
the minimum possible cost, yielding a price of anarchy of about k. However, this 
equilibrium is extremely unstable: After a single player switches to the top path, 
all the others will follow. 

The example of inspires us to ask if there always exists a Nash 
equilibrium which is close to optimal. The network in shows that the 
ratio can be as high as Hy, ~ Ink. 


THEOREM 8.2.2. In every fair network formation game with k players, there is 
a Nash equilibrium with cost at most Hy = 1+ $ + 4 +--+ 4 =Ink+O(I) times 
the optimal cost. 


PROOF. Given a pure strategy profile s for the players, let ne(s) be the number 
of players that use link e. Let E(s) be the set of links with ne(s) > 1. Since this 
game is a potential game, we know that best-response dynamics will lead to a pure 
Nash equilibrium. Observe that all strategy profiles s pees 


cost(s) := 5 Ce < G(s) = De, vs 
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cost=1+e 


FIGURE 8.9. In this example, each of the k players needs to choose a path from 
si to t. The optimal cost network here is for all players i to choose the path 
from s; to v to t, resulting in a network of total cost 1+¢. However, this is not 
a Nash equilibrium. Indeed, for player k, it is a dominant strategy to use the 
path są — t since his alternative cost is at least (1 + €)/k. Given this choice 
of player k, it is dominant for player k — 1 to choose path s,_1 —> t. Iterating 
yields an equilibrium where player i chooses the path s; — t. This equilibrium 
is unique since it arises by iterated removal of dominated strategies. (In fact, 
this is the only correlated equilibrium for this example.) The cost of the 
resulting network is approximately Hp = 1-4 4 + 3 fess 7 times that of the 
cheapest network. 


Let Sopt be a strategy profile that minimizes cost(s). Best-response dynamics 
starting from Sopt reduce the value of the potential function and terminate in a 
Nash equilibrium, say sy. It follows that 


cost(sr) < (ss) < $(Sopt) < cost (Sopt) Hk, 


which completes the proof. 


8.3. A market sharing game 


There are k NBA teams, and each of them must decide in which city to locatef|| 
Let vj be the profit potential, i.e., the number of basketball fans, of city j. If £ 


teams select city j, they each obtain a utility of v;/€. See|Figure 8.10 


PROPOSITION 8.3.1. The market sharing game is a potential game and hence 


has a pure Nash equilibrium. (See|\Exercise 4.18.) 


ln 2014, Steve Ballmer, based in Seattle, bought the Clippers, an NBA team based in Los 
Angeles. He chose to keep the team in Los Angeles, even though there was already another NBA 
team there, but none in Seattle. Moving the team would increase the total number of fans with 
a hometown basketball team but would reduce the profit potential of the Clippers. 
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O 


W ie ‘fe 
A 


epee aS) 
i oe Population 


3.3 million 


Population 
1 million 


Population Population 
1 million 1 million 


FIGURE 8.10. Three basketball teams are deciding which city to locate in 
when four choices are available. It is a Nash equilibrium for all of them to 
locate in the largest city where they will each have a utility of 1.1 million. If 
one of the teams were to switch to one of the smaller cities, that team’s utility 
would drop to 1 million. 


For any set S of cities, define the total value 


V(S) = 5 Uj. 
jes 
Assume that v; > v;41 for all j. Clearly S* = {1,...,4} maximizes V(S) over all 
sets of size k. 
We use c = (ci,...,Cx) to denote a strategy profile in the market sharing game, 
where c; represents the city chosen by team i. Let S := {c1,..., Ck} be the set of 
cities selected. 


LEMMA 8.3.2. Let c and © be any two strategy profiles in the market shar- 
ing game, where the corresponding sets of cities selected are S and Š. Denote by 
ui(cj;,c_;) the utility obtained by team i if it chooses city ci and the other teams 
choose cities c_;. Then 


PROOF. Let č € §\ S. Then u;(é,c_;) = va. Thus 
XO ui(G, ci) > V(S\ S) > V(S) -V (9). (8.13) 


REMARK 8.3.3. [Lemma 8.3.2|is a typical step in many price of anarchy proofs. 
It disentangles the sum of utilities of the players when each separately deviates 


from c; to č; in terms of the quantities V(S) and V (S). This is sufficient to prove 
that the price of anarchy of this game is at most 2. 


THEOREM 8.3.4. Suppose that c = (c1,...,cK) is a Nash equilibrium in the 
market sharing game and S := S(c) is the corresponding set of cities selected. 
Then the price of anarchy is at most 2; i.e., V(S*) < 2V(S). 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


160 8. THE PRICE OF ANARCHY 


PROOF. We claim that 
V(S) = S ulei ci) > X ul, c) > V(S*) — V(S), (8.14) 


which proves the theorem. The first equality in}(8.14)}is by definition. The inequal- 
ity u;(ci,c_;) > uilc, c_;) follows from the fact that c is a Nash equilibrium. The 


final inequality follows from|Lemma 8.3.2 


REMARK 8.3.5. In we'll see that the price of anarchy bound of 2 
can be replaced by 2 — 1/k. 


8.4. Atomic selfish routing 


In|§8.1.2} we considered a selfish routing game in which each driver was infini- 
tesimal. We revisit selfish routing, but in a setting where there are few drivers and 
each one contributes significantly to the total travel time. 


EXAMPLE 8.4.1. Consider a road network G = (V, E) and a set of k drivers, 
with each driver 7 traveling from a starting node s; € V to a destination t; € V. 
Associated with each edge e € E is a latency function e(n) = ae:n+be representing 
the cost of traversing edge e if n drivers use it. Driver i’s strategic decision is which 
path P; to choose from s; to t;, and her objective is to choose a path with minimum 
latency. 


«0 
— (a) (b) 


n? yo 


Ney 
N 


FIGURE 8.11. In the left graph, each directed edge is labeled with the travel 
time as a function of the number of drivers using it. Here the purple driver 
is traveling from a to b, the green driver is traveling from a to c, the blue 
driver is traveling from b to c, and the yellow driver is traveling from c to b. 
The figure in the middle shows the socially optimal routes (all single-hop), 
that is, the routes that minimize the average driver travel time. Each edge is 
labeled with the latency that driver experiences. This set of routes is also a 
Nash equilibrium. On the other hand, there is also a bad Nash equilibrium, 
in which each driver takes a 2-hop path, shown in the figure on the right. In 
the middle and right figures, the label on each colored path is the latency that 
particular driver experiences on that link. In this case, the price of anarchy is 
5/2. 


In the example shown in Figure the socially optimal outcome is also a 
Nash equilibrium with a total latency of 4, but there is another Nash equilibrium 
with a total latency of 10. Next, we show that in any network with affine latency 
functions, the price of anarchy (the ratio between travel time in the worst Nash 
equilibrium and that in the socially optimal outcome) is at most 5/2. 
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Denote by L;(P;, P_;) the latency along the path P; selected by driver i, given 
the paths P_; selected by the other drivers, and let 


=) Li(P, P) 


be the sum of these latencies. 
We will need the following claim: 


CLAIM 8.4.2. Letn and m be any nonnegative integers. Then 
5 1 
n(m+1)< 37 + zm. 


PROOF. We have to show that f(m,n) := 5n? + m? — 3n(m + 1) > 0 for 
nonnegative integers m,n. The cases where m = 0 or n = 0 or m = n = 1 are clear. 
In all other cases n + m > 3, so f(m, n) = (2n — m)? +n(n +m — 3) > 0. 


The next lemma is the key to our price of anarchy bounds. 


LEMMA 8.4.3. Let P = (P,,..., P) be any strategy profile (a path P; from si 
to t; for each i) in the atomic selfish routing game G. Let P* = (Py,..., Px) be 
the paths that minimize the total travel time L(P*). Then 


D jsi (P*) + ri (P). (8.15) 


PROOF. Let ne be the number of paths in P that use edge e. Then the total 
travel time experienced by the drivers in this equilibrium is 


=X 0 Li(Pi,P-i) =) So (Ge: ne + be) = D> nelae + ne + be). 
i i e€P; e 
Let nz be the number of paths among Př,..., P that use edge e. Observe that 
dP, PASY SG - (ne +1) + be), 
i ecP* 


since switching from P; to P* can increase the number of drivers that use e by at 
most 1. Thus 


Da pC - (ne +1) + be), 


since the (upper pani on the) travel time on each edge is counted a number of 
times equal to the number of paths in P* that use it. Finally, using Claim 


we have nž(ne + 1) < (nz)? + E (Ne), SO 


SPP) SD (ae (30 * 4 and) + ben’) 
= 3 Dritan + be) + 5 Lane 


1 

~L(P). 
SLP) 
THEOREM 8.4.4. The price of anarchy of the atomic selfish routing game is 


5/2. 


< ŽL(P*) + 
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ProoF. The example in|Figure 8.11}shows that the price of anarchy is at least 
5/2. To see that it is at most 5/2, let P = (P,,...,P,) be a Nash equilibrium 
profile in the atomic selfish routing game. Then 


L(P) = XL ie Pi) < > BS), 


where the second inequality follows from the fact that P is a Nash equilibrium. 


Thus, by |(8.15)} 


Finally, rearranging, we get 


L(P) < 2 1(P*), 


8.4.1. Extension theorems. The crucial step in the price of anarchy bound 
we just obtained was which enabled us to disentangle the term 
>>, Li(P*,P-_;) into a linear combination of L(P*) and L(P), the latencies as- 
sociated with the two “parent” strategy profiles. Such a disentanglement enables 
us to prove a price of anarchy bound for a pure Nash equilibrium that extends 
automatically to certain scenarios in which players are not in Nash equilibrium. In 
this section, we prove this fact for a general cost minimization game. 

Let G be a game in which players are trying to minimize their costs. For a 
strategy profile s € S1 x S2 x--- x Sx, let C;(s) be the cost incurred by player i on 
strategy profile s. As usual, strategy profile s = (s,,..., 8%) is a Nash equilibrium 
if for each player i and s} € Sj, 


Ci(si, 5-1) < Ci(3;,84). 


Define the overall cost in profile s to be 


cost(s) := 5 C;(s). 


DEFINITION 8.4.5. Let s = (s1,..., Sn) be a strategy profile in cost-minimization 
game G. Let s* be the strategy profile that minimizes cost(s). Then G is (A, 1)- 
smooth if 

5 Ci(s7,s_;) < A- cost(s*) + w- cost(s). (8.16) 


REMARK 8.4.6. |Lemma 8.4.3] proves that the atomic selfish routing game is 
(5/3, 1/3)-smooth. 


The following theorem shows that (A, 4)-smooth cost minimization games have 
price of anarchy at most \/(1— u) and enables extension theorems that yield the 
same bound with respect to other solution concepts. 


THEOREM 8.4.7. Let G be a cost-minimization game as discussed above. Let 
s* = (s},...,5,) be a strategy profile which minimizes cost(s*) and suppose that G 
is (A, )-smooth. Then the following price of anarchy bounds hold: 


(1) Ifs is a pure Nash equilibrium, then 


cost(s) < cost(s*). 


Lp 
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(2) Mixed Nash equilibria and (coarse) correlated equilibria: If p is a distri- 
bution over strategy profiles such that for alli and Ši 


S 7 psCi( il 8) SD PsC; (Si,S—i), (8.17) 
ther] 


pscost(s) < cost(s*). (8.18) 
OEREN 


(3) Sublinear regret: If the game G is played T times, and each player uses 
a sublinear regret algorithm to minimize their cost [| with player i using 
strategy st in the t round and $ = (st ,..., st), then 


5 C(e’) < 


PROOF. The proof of part (1) is essentially the same as that of/Theorem 8.4.4 
and it is a special case of part (2). 


~~ cost(s") + o(1). (8.19) 


Proof of (2): 
Ba = Dips DG Si, S—i) 
< Taba (s*,s_; by (8.17) 
< pal A+ cost(s*) + ye cost(s)) by |(8.16) 
= s cost(s*) + u- Sods cost(s). 


Rearranging yields|(8.18) 
Proof of (3): We have 


T ret 
2 cost(s )= FD Cilsists) 
=i t=1 i 
1 a * t 
< TA + o(1) 
t=1 i 


where the second inequality is the guarantee from the sublinear regret learning 
algorithm. Next we use the smoothness inequality (8.16) to upper bound the right 
hand side yields: 


£ ie 
7 S cost(s*) < 5 (A - cost(s*) + u- cost(s*)) + o(1). 


2 The condition for p to be a correlated equilibrium is that for all i, s;, and si, we have 


Es; P(sis_1)Cil8is 8—i) < Es; P(s;,s_,)Ci(sj,8-i). See This condition implies |(8.17) 
S17) 


by taking s; := s* and summing over s;. Note though that is a weaker requirement, also 
known as a coarse correlated equilibrium. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


164 8. THE PRICE OF ANARCHY 


Finally, rearranging, we get 


Lae A 
= X cost(s*) < —— - cost(s*) + o(1) 


Rearranging yields|(8.19) 


8.4.2. Application to atomic selfish routing. Using the fact that the 


atomic selfish routing game is (5/3, 1/3)-smooth (by |Lemma 8.4.3), we can apply 
Theorem 8.4.7| Part (3) and obtain the following corollary. 


COROLLARY 8.4.8. Let G be a road network with affine latency functions. Sup- 
pose that every day driver i travels from s; to ti, choosing his route P! on day t 
using a sublinear regret algorithm (such as the Multiplicative Weights Algorithm 


from|§ 18.3.2). Then 


LS pty < n 1 
5 OUP) < SL(P*) + 0(1). 


t=1 


REMARK 8.4.9. Since drivers are unlikely to know each other’s strategies, this 


corollary seems more applicable than|Theorem 8.4.4 
For other applications of|/Theorem 8.4.7| see the exercises. 


Notes 


In 2012, Elias Koutsoupias, Christos Papadimitriou, Noam Nisan, Amir Ronen, Tim 
Roughgarden, and Eva Tardos won the Gédel Prizd for “laying the foundations of al- 
gorithmic game theory.” Two of the three papers cited for the prize are 
concerned with the price of anarchy. 


è wy E KU. a 


Elias Koutsoupias Christos Papadimitriou 


The “price of anarchy” concept was introduced by Koutsoupias and Papadimitriou 
in 1999 [KP09], though in the original paper the relevant terminology was “coor- 
dination ratio”. The term “price of anarchy” is due to Papadimitriou [Pap01]. The first 
use of this style of analysis in congestion games was in 2000 [RT02]. 


An extensive discussion of the material on selfish routing discussed in includ- 
ing numerous references, can be found in the book by Roughgarden |Rou05}. See also 


Chapters 17 and 18 of [Nis07|. Pigou’s example is discussed in his 1920 book |Pig20}. 


4 The Gédel Prize is an annual prize for outstanding papers in theoretical computer science. 
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Tim Roughgarden Eva Tardos 


The traffic model and definition of Nash flows are due to Wardrop and the proof 
that Nash flows exist is due to Beckmann, McGuire, and Winsten |BMW56}. The Braess 
Paradox is from [Bra68]. and are due to Roughgarden and 
Tardos [RT02]. A version of under a convexity assumption was proved 
by ee [Rou03]; this assumption was removed by Correa, Schulz, and Stier- 
Moses [CSSM04|. The proofs of [Theorem 8.1.8] Theorem 8.1.8] and [Theorem 8.1.15] presented here are 
due to ed et al. [CSSM04]. With suitable tolling, the raeicieney of Nash equilibria 
in selfish routing can be eliminated. See, e.g., [FJM04]). 

The network formation results of [§8.2| are due to Anshelevich et al. |ADK*08}. In 
their paper, they introduce the notion of the price of stability of a game, in which the 
ratio between the optimal value of a global objective and the value of this objective in 
Nash equilibrium is compared. The difference between the price of stability and the price 
of anarchy is that the latter is concerned with the worst Nash equilibrium, whereas the 
former is concerned with the best Nash equilbrium. In fact, [Theorem 8.2.2] is a price of 
stability result. See also the survey of network formation games by Jackson |[Jac05| and 
Chapter 19 of [Nis07]. 

The market sharing game of is a special case of a class of games called utility 
games introduced by Vetta these games, players must choose among a set of 
locations and the social surplus is a function of the locations selected. For example, he 
considers a facility location game in which service providers choose locations at which they 
can locate their facilities, in response to customer demand that depends on the distribution 
of customer locations. All of the games in Vetta’s class have price of anarchy 2. 

The results on nonatomic selfish routing in are due to Christodoulou and Kout- 
soupias and Awerbuch, Azar, and ie ae The smoothness framework 
and extension theorems described in[§8.4-Jare due to Roughgarden [Rou09]. These results 
synthesize a host of prior price of anarchy proofs and extensions that allow for weaker as- 
sumptions on player rationality (e.g. [BHLR08][BEDL10}(CK05al (GM V05} [MV 04] [Vet02] ). 

The price of anarchy and smoothness notions have been extended to settings of incom- 
plete information and to more complex mechanisms. For a taste of this topic, see 
and for a detailed treatment, see, e.g., 

For more on the price of anarchy and related topics, see Chapters 17—21 of 
and the loetut notes by e [Roul3) = 


is due to Exercises [8.11] [8.12] and [8.13] are due to [Rou09] 
Exercise 8.14ļis “from [KP09 [KP09] sdf Exercise 8.15 Exercise 8.15 15}is ae FLM* 03) 


Exercises 


8.1. Show that Theorem 8.1.8] holds in the presence of multiple traffic flows. 
Specifically, let G be a network where r; > 0 units of traffic are routed 
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from source s; to destination t;, for each i = 1,...,k. Suppose that the 
latency function on each edge e is affine; that is, (.(~) = aex + be, for 
constants ae, be > 0. Show that the price of anarchy is at most 4/3; that 
is, the total latency in equilibrium is at most 4/3 that of the optimal flow. 


8.2. Suppose that £ is the set of all nonnegative, weakly increasing, concave 
functions. Show that for this class of functions 
4 
Ar (£) < Fa 
3 
8.3. Let G be a network where one unit of traffic is routed from a source s to a 


destination t. Suppose that the latency function on each edge e is linear; 
that is, le(x) = aex, for constants ae > 0. Show that the price of anarchy 
in such a network is 1. Hint: In|(8.7)| use the inequality ry < (x? + y?)/2. 


8.4. Extend [Theorem 8.1.15|to the case where there are multiple flows: Let G 


be a network with latency functions in £ and total flow r; from s; to ti, 
for i = 1,...,k. Then the price of anarchy in G is at most Ar(£), where 


8.5. Let c > 0 and suppose that fe is the function 


1 0X eee 
fe(x) are 


Oo, LSC. 


Consider a selfish routing network where all latency functions are in £ 
where 


L={fel-)le > 0}. 
Suppose that in equilibrium, for every edge e, the flow Fẹ < (1— 8)ce, 
where the latency function on edge e is fe (x). Show that the price of 
anarchy is upper bounded by 


8.6. Let G be a selfish routing network in which r units of traffic are routed 
from source s to destination t. Suppose that Ze(-) is the latency function 
associated with edge e € Æ. Consider another network G” with exactly the 
same network topology and the same amount of traffic being routed, but 
where the latency function ¢(-) on edge e satisfies 


Cle i= ae) Ve € E. 


This corresponds to doubling the capacity of the link. Suppose that f* is an 


optimal flow in G and f’ is an equilibrium flow in G’. Use|Theorem 8.1.16 
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to prove that 
Le (f’) < Le(f*). 


S 8.7. Show that the price of anarchy bound for the market sharing game from 
can be improved to 2 — 1/k when there are k teams. Show that this 
bound is tight. 


S*8.8. Consider an auctioneer selling a single item via a first-price auctior] 
Each of the n bidders submits a bid, say b; for the it? bidder, and, given 
the bid vector b = (bi,...,b,), the auctioneer allocates the item to the 
highest bidder at a price equal to her bid. (The auctioneer employs some 
deterministic tie-breaking rule.) Each bidder has a value v; for the item. 
A bidder’s utility from the auction when the bid vector is b and her value 
is v; is 


E A vi — b; i wins the auction, 
0 otherwise. 


Each bidder will bid in the auction so as to maximize her (expected) util- 
ity. The expectation here is over any randomness in the bidder strategies. 
The social surplus V(b) of the auction is the sum of the utilities of the 
bidders and the auctioneer revenue. Since the auctioneer revenue equals 
the winning bid, we have 


V(b) := value of winning bidder. 


Show that the price of anarchy is at most 1 — 1/e; that is, for b a Nash 
equilibrium, 


: [V(b)] > (1 - 3 maxv. 


e 


Hint: Consider instead what happens when bidder 7 deviates from b; to the 
distribution with density f(x) = 1/(v; — x), with support [0, (1 — 1/e)u;]. 


8.9. Consider a two-bidder, two-item auction, where the values of the bidders 
are shown in Table [I] 

What is the optimal (i.e., social surplus maximizing) allocation of items 
to bidders? Suppose that the seller sells the items by asking the bidders 
to submit one bid for each item and then running separate second-price 
auctions for each item. (In a second-price auction, the item is allocated to 
the bidder that bid the highest at a price equal to the second highest bid.) 
Show that there is a pure Nash equilibrium in which the social surplus is 
maximized. Consider the following alternative set of bids bı = (0,1) and 
bə = (1,0). Show that these bids are a Nash equilibrium and that the price 
of anarchy in this example is 2. 


5 See Chapter 14)for a detailed introduction to auctions, and|Theorem 14.7.1]for a general- 


ization of this result. 
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TABLE 1. The valuations of the bidders for each combination of items. 


Bidder: | 1 | 2 
Items received: 
No items 0/0 
Item 1 211 
Item 2 1/2 
Both items 2/2 


8.10. Consider atomic selfish routing in a Pigou network with latency x on the 


top edge and latency 2 on the bottom edge shown in|Figure 8.12} What is 
the total latency for the optimal routing? Show that there are two equilib- 


ria and that they have different costs. 


latency with 
n units of flow 
(n integer) 
2 unit of 2 unit of 
flow in Ns flow out 


2 


bS 


FIGURE 8.12. Figure for|Exercise 8.10 


constant latency 
(independent of flow) 


8.11. Prove the analogue of|Theorem 8.4.7| Part (1), for games in which we are 


interested in maximizing a global objective function such as social surplus: 
Consider a k-player game G. Let V(s) be a global objective function such 


that 
V(s) > X` ui(s). (8.1) 
We say the game is (A, j1)-smooth for strategy profile s’ = (s{,..., s4) if for 
all strategy profiles s = (s1,..., Sx) 
X ui(s,8—i) > AV(s’) — uV (8). (8.2) 
Let s* = (s},..., 87) be a strategy profile which maximizes global objective 


function V(s), and let s be a Nash equilibrium. Show that if G is (A, )- 
smooth for s*, then 


À 
V(s) > —— V (s*). 
(9) > Vee") 
For example, [Lemma 8.3.2| shows that the market sharing game is (1, 1)- 
smooth for all strategy profiles s’. From this, we derived [Theorem 8.3.4 
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8.12. Prove the analogue of|/Theorem 8.4.7} Part (2), for games in which we are 
interested in maximizing a ee objective ene such as social surplus: 
Suppose that V(-) is a global objective function that satisfies and let 
G be a game that is (A, p)-smooth for s*, where s* maximizes V(-). (See 
(8.2)}) Also, let p be a distribution over strategy profiles that corresponds 
to a correlated equilibrium so that 


X pstui(s) )> > potil 3; ,S—i) (8.3) 


Show that 


tep V(8)] > TV6. 


8.13. Prove the analogue of|Theorem 8.4.7| Part (3), for games in which we are 
interested in maximizing a global objective function such as social surplus: 
Suppose that V(-) is a global objective function that satisfies and let 
G be a game that is (A, 41)-smooth for s*, where s* maximizes V(-). (See 
(8.2)|) Suppose the game is played T times and each player uses a sublinear 
regret algorithm to determine his strategy in round t. Recall that if we let 
s* = (sj, 55,...,5}) be the strategy profile employed by the k players in 

round t, then the guarantee of the algorithm is that, for any st;, with 

1<t<T, 


a? 


9 uils, s*;) > T uj(s — o(T). (8.4) 


ses; 


Show that 


8.14. Suppose that there are n jobs, each job owned by a different player, and 
n machines. Each player chooses a machine to run its job on, and the 
cost that player incurs is the load on the machine, i.e., the number of jobs 
that selected that machine, since that determines the latency that player 
experiences. Suppose that it is desirable to minimize the maximum load 
(number of jobs) assigned to any machine. This is called the makespan 
of the allocation. Clearly it is a Nash equilibrium for each job to select a 
different machine, an allocation which achieves the optimal makespan of 1. 

e Show that it is a mixed-strategy Nash equilibrium for each player to 
select a random machine. 

e Show that the price of anarchy for this mixed Nash equilibrium (i.e., 
the expected makespan it achieves divided by the optimal makespan) 
is O(log n/ log log n). 


8.15. Consider the following network formation game: There are n vertices each 
representing a player. The pure strategy of a player consists of choosing 
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which other vertices to create a link to. A strategy profile induces a graph, 
where each edge is associated with the vertex that “created” it. Given a 
strategy profile s. the cost incurred by player i is 


cost;(s) := a+ nj(s;) + X- ds(i, 3), 
j#t 
where n;(s;) is the number of links i created (each link costs a to create) 
and ds(i, j) is the distance from i to j in the graph resulting from strategy 
profile s. 
e Show that if a > 2, then the graph which minimizes 7, cost; is a star, 
whereas if a < 2, then it is a complete graph. 
e Show that for a < 1 or a > 2, there is a Nash equilibrium with total 
cost equal to that of the optimum graph. 
e Show that for 1 < a < 2, there is a Nash equilibrium with total cost 
at most 4/3 times that of the optimum graph. 
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CHAPTER 9 


Random-turn games 


In [Chapter 1] we considered combinatorial games, in which the right to move 
alternates between players; and in[Chapter 2]and[Chapter 4]we considered matrix- 
based games, in which both players (usually) declare their moves simultaneously 
and possible randomness decides what happens next. In this chapter, we consider 
some games which are combinatorial in nature, but the right to make the next move 
is determined by a random coin toss. 

Let S be an n-element set, which will sometimes be called the board, and let 
f be a function from the 2” subsets of S to R. A selection game is played as 
follows: The first player selects an element of S, the second player selects one of 
the remaining n — 1 elements, the first player selects one of the remaining n — 2 
elements, and so forth, until all elements have been chosen. Let Sı and Sz signify 
the sets chosen by the first and second players, respectively. Then player I receives 
a payoff of f(S1) and player II receives a payoff of — f (S1). Thus, selection games 
are zero-sum. 


9.1. Examples 


EXAMPLE 9.1.1 (Random-turn Hex). Let S be the set of hexagons on a 
rhombus-shaped Lx L hexagonal grid. Define f (S1) to be 1 if Sı contains a crossing 
connecting the two yellow sides, —1 otherwise. In this case, once S; contains a 
yellow crossing or S2 contains blue crossing (which precludes the possibility of 
Sı having a yellow crossing), the outcome is determined and there is no need to 
continue the game. 


FIGURE 9.1. Random-turn Hex played on a 15 x 15 board. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


172 9. RANDOM-TURN GAMES 


EXAMPLE 9.1.2 (Team captains). Two team captains are choosing baseball 
teams from a finite set S of n players for the purpose of playing a single game against 
each other. The payoff f (S1) for the first captain is the probability that the players 
in Sı (together with the first captain) will win against the players in Sj (together 
with the second captain). The payoff function may be very complicated (depending 
on which players know which positions, which players have played together before, 
which players get along well with which captain, etc.). Because we have not specified 
the payoff function, this game is as general as the class of selection games. 


EXAMPLE 9.1.3 (Recursive Majority). Suppose we are given a complete 
ternary tree of depth h. Let S be the set of leaves. In each step, a fair coin toss de- 
termines which player selects a leaf to label. Leaves selected by player I are marked 
with a + and leaves selected by player II are marked with a —. A parent node in 
the tree acquires the same sign as the majority of its children. The player whose 


mark is assigned to the root wins. [Example 1.2.15} discusses the alternating-turn 


version of this game. 


®®O06@ 


FIGURE 9.2. Here player II wins; the circled numbers give the order of the moves. 


As we discussed in {Chapter 1| determining optimal strategies in alternating 
move selection games, e.g., Hex, can be hard. Surprisingly, the situation is different 
in random-turn selection games. 


9.2. Optimal strategy for random-turn selection games 


A (pure) strategy for a given player in a random-turn selection game is a func- 
tion M which maps each pair of disjoint subsets (T1, T2) of S to an element of 
T; := S\(Tı UT»), provided T 4 Ø. Thus, M(T,,T>) indicates the element that 
the player will pick if given a turn at a time in the game when player I has thus far 
picked the elements of T; and player II has picked the elements of Tə. 

Denote by F(T, T2) the expected payoff for player I at this stage in the game, 
assuming that both players play optimally with the goal of maximizing expected 
payoff. As is true for all finite perfect-information, two-player games, Æ is well 
defined, and one can computd] E and the set of possible optimal strategies by 
induction on the size of T3. First, if Ts = @, then E(Tı, T2) = f (Tı). Next, suppose 
that we have computed E(T,,T2) whenever |T3| < k. Then if |T;| = k +1 and 
player I has the chance to move, player I will play optimally if and only if she 


1 This method is called dynamic programming. 
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chooses an s from T} for which E(T; U {s}, T2) is maximal. Similarly, player II 
plays optimally if and only if he minimizes E'(T,T>U {t}) at each stage. Hence 


E(D, Ta) = 5 (max E(T, U {s}, Ta) + min B(T, Ta U{4))- (9-1) 


We will see that the maximizing and the minimizing moves are actually the same. 

The foregoing analysis also demonstrates a well-known fundamental fact about 
finite, turn-based, perfect-information games: Both players have optimal pure 
strategies|”| (This contrasts with the situation in which the players play “simul- 
taneously” as they do in Rock-Paper-Scissors.) 


THEOREM 9.2.1. The value of a random-turn selection game is the expectation 
of f(T) when a set T is selected randomly and uniformly among all subsets of S. 
Moreover, any optimal strategy for one of the players is also an optimal strategy 
for the other player. 


PROOF. For any player II strategy, player I can achieve the expected payoff 
Sl f(T)] by playing exactly the same strategy (since, when both players play the 
same strategy, each element will belong to Sı with probability 1/2, independently). 
Thus, the value of the game is at least E[f(T)]. However, a symmetric argument 
applied with the roles of the players interchanged implies that the value is no more 
than E[f(T)]. 

Since the remaining game in any intermediate configuration (T1, T2) is itself a 
random-turn selection game, it follows that for every Tı and T> 


E(T,,T2) = E(f (Tı UT)), 


where T is a uniform random subset of T3. Thus, for every s € T3, 


Rn j= 5 (HUT, U {5}, Ts) + (Ti, To U{s})). 


Therefore, if s € T3 is chosen to maximize E(T, U {s}, T2), then it also minimizes 
E(T,, Tə U {s}). We conclude that every optimal move for one of the players is an 
optimal move for the other. 


If both players break ties the same way, then the final S4 is equally likely to be 
any one of the 2” subsets of S. 

[Theorem 9.2.1] is quite surprising. In the baseball team selection, for example, 
one has to think very hard in order to play the game optimally, knowing that at 
each stage the opponent can capitalize on any miscalculation. Yet, despite all of 
that mental effort by the team captains, the final teams look no different than they 
would look if at each step both captains chose players uniformly at random. 

For example, suppose that there are only two players who know how to pitch 
and that a team without a pitcher always loses. In the alternating-turn game, a 
captain can always wait to select a pitcher until just after the other captain selects 
a pitcher. In the random-turn game, the captains must try to select the pitchers 
in the opening moves, and there is an even chance the pitchers will end up on the 
same team. 

generalizes to random-turn selection games in which the player 
to get the next turn is chosen using a biased coin. If player I gets each turn with 
probability p, independently, then the value of the game is E[f(T)], where T is 


2 See also|Theorem 6.1.7 
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a random subset of S for which each element of S is in T with probability p, 
independently. The proof is essentially the same. 


9.3. Win-or-lose selection games 


We say that a game is a win-or-lose game if f(T) takes on precisely two values, 
which we assume to be —1 and 1. If S4 C S and s € S, we say that s is pivotal for 
Sı if f(Si:U{s}) # f(Si\{s}). A selection game is monotone if f is monotone; that 
is, f(S1) > f(S2) whenever Sı D S2. Hex is an example of a monotone, win-or-lose 
game. For such games, the optimal moves have the following simple description. 


LEMMA 9.3.1. In a monotone, win-or-lose, random-turn selection game, a first 
move s is optimal if and only if s is an element of S that is most likely to be pivotal 
for a random-uniform subset T of S. When the position is (S1, S2), the move s in 
S \ (S1 U S2) is optimal if and only if s is an element of S \ (S1 U S2) that is most 
likely to be pivotal for Sj UT, where T is a random-uniform subset of S \ (S1 U S2). 


ProoF. This follows from monotonicity and the discussion of optimal strate- 
gies preceding [(9.1)] 

For win-or-lose games, such as Hex, the players may stop making moves af- 
ter the winner has been determined, and it is interesting to calculate how long a 
random-turn, win-or-lose, selection game will last when both players play optimally. 
Suppose that the game is a monotone game and that, when there is more than one 
optimal move, the players break ties in the same way. Then we may take the point 
of view that the playing of the game is a (possibly randomized) decision procedure 
for evaluating the payoff function f when the items in S are randomly allocated. 
Let # denote the allocation of the items, where x; = +1 according to whether the 
it” item goes to the first or second player. We may think of the zx; as input vari- 
ables, and the playing of the game is one way to compute f(z). The number of 
turns played is the number of variables of ¥ examined before f(z) is computed. We 
use some inequalities from the theory of Boolean functions to bound the average 
length of play. 


DEFINITION 9.3.2. Let f(x) be a Boolean function on n variables taking val- 
ues in {—1,1}. The influence J;(f) of the variable x; on f(x) is the probability 
that flipping x; will change the value of f(x), where x is uniformly distributed on 
{—1,1}”. Thus, recalling the notation f(x;,x_;) := f(x), 


KA = g | Fe) — FL) 


For monotone functions, which have f(1,x_;) > f(—1,x_;) for all x_,, it follows 


that j 
Lf) = = 0 (Fx) H-1 x)) = EY @a- (9.2) 
DEFINITION 9.3.3. Given an unknown vector x = (21,...,2n), a decision tree 


for calculating f(x) is a procedure for deciding, given the values of the variables 
already examined, which variable to examine next. The procedure ends when the 
value of f(x) is determined. For example, ifn = 3 and f(x) is the majority function, 
then such a procedure might first examine zı and x2. The third variable x3 only 
needs to be examined if xı Æ 29. 
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LEMMA 9.3.4. Let f(x) : {-1,1}" — {-1,1} be a monotone function. Con- 
sider any decision tree for calculating f(x), when x is selected uniformly at random 


from {-1,1}". Then 


2 
S| # variables examined | > [Du I;( J ; 


PROOF. Using|(9.2)|and Cauchy-Schwarz, we have 


DA) =E £ 1 =E ko Ds Lily, nonin 


IA 


[Z (2)?] ( 5 3 


i: ©; examined 


= \ z ( 5 z) = yE[# bits examined]. 


The last equality is justified by noting that E[x; £j 12; and z; both examined] = 0 when 
i Æ j, which holds since conditioned on x; being examined before xj, conditioned 
on the value of x;, and conditioned on x; being examined, the expected value of x; 
is zero. 


Lemma 9.3.4]implies that in a win-or-lose random-turn selection game, 
2 
E[# turns] > > [Em L( n) : 


9.3.1. Length of play for random-turn Recursive Majority. To apply 
[Lemma 9.3.4] to [Example 9.1.3} we need to compute the probability that flipping 
the sign of a given leaf changes the overall recursive majority. For any given node, 
the probability that flipping its sign will change the sign of its parent is just the 
probability that the signs of the other two siblings are distinct, which is 1/2. 


i: xi examined 


FIGURE 9.3 


This holds all along the path to the root, so the probability that flipping the 
sign of leaf i will flip the sign of the root is just I;(f) = (2)%, where A is the height 
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of the tree. Thus, since there are 3” leaves, 
3\ 2h 
t I;( =|= : 
E[# turns] [Du a| ( `) 


Notes 


This chapter presents the work in [PSSW07]. Indeed, as we could not improve on the 
exposition there (mostly due to Oded Schramm, Scott Sheffield and David Wilson), we 
follow it almost verbatim. As noted in that paper, the game of Random-turn Hex was 
proposed by Wendelin Werner on a hike. [Lemma 9.3.4] is from O’Donnell and Servedio 
[OS04]. [Figure Figure 9.1}and|Figure 9.4| [Figure 9.4] Aare due to David Wilson. 

a e i two optimally played games of Random-turn Hex. It is not known 
what is the expected length of such a game on an L x L board. [PSSW07] shows that it 
is at least L3/2+°) , but the only upper bound known is the obvious one of L?. 


FIGURE 9.4. Random-turn Hex on boards of size 11 x 11 and 63 x 63 under 
(nearly) optimal play. (The caveat “nearly” is there because the probability 
that a hexagon is pivotal was estimated by Monte Carlo simulation.) 


Exercises 


9.1. Generalize the proof of|Theorem 9.2.1|further so as to include the following 
two games: 


(a) Restaurant selection: 
Two people (with opposite food preferences) want to select a dinner 
location. They begin with a map containing 2” distinct points in R?, 
indicating restaurant locations. At each step, the person who wins a 
coin toss may draw a straight line that divides the set of remaining 
restaurants exactly in half and eliminate all the restaurants on one 
side of that line. Play continues until one restaurant z remains, at 
which time player I receives payoff f(z) and player II receives — f(z). 


(b) Balanced team captains: 
Suppose that the captains wish to have the final teams equal in size 
(i.e., there are 2n players and we want a guarantee that each team 
will have exactly n players in the end). Then instead of tossing coins, 
the captains may shuffle a deck of 2n cards (say, with n red cards and 
n black cards). At each step, a card is turned over and the captain 
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whose color is shown on the card gets to choose the next player. 


9.2. Recursive Majority on b-ary trees: Let b = 2r +1, r € N. Consider Re- 
cursive Majority on a b-ary tree of depth h. For each leaf, determine the 
probability that flipping the sign of that leaf would change the overall re- 
sult (i.e., the influence of that leaf). 
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CHAPTER 10 


Stable matching and allocation 


In 1962, David Gale and Lloyd Shapley published a seminal paper entitled 
“College Admissions and the Stability of Marriage” [GS62]. This led to a rich 
theory with numerous applications exploring the fundamental question of how to 
find stable matchings, whether they are of men with women, students with schools, 
or organ donors with recipients needing a transplant. In 2012, the Nobel Prize 
in Economics was awarded to Alvin Roth and Lloyd Shapley for their research on 
“the theory of stable allocations and the practice of market design’.” In this chapter, 
we describe and analyze stable allocation algorithms, including the Gale-Shapley 
algorithm for stable matching. 


10.1. Introduction 


Suppose there are n men and m women. Every man has a preference order 
over the m women, while every woman also has a preference order over the n men. 
A matching is a one-to-one mapping from a subset of the men to a subset of the 
women. A matching M is unstable if there exists a man and a woman who are 
not matched to each other in M but prefer each other to their partners in M. We 
assume every individual prefers being matched to being unmatched|}] Otherwise, 
the matching is called stable. Clearly, in any stable matching, the number of 
matched pairs is min(n, m). 


FIGURE 10.1. An unstable pair. 


1 Thus, there are four kinds of instability: (1) Alice and Bob are both matched in M but 
prefer each other to their current matches, (2) Alice prefers Bob to her match in M and Bob is 
unmatched in M, (3) similarly, with roles reversed, (4) both Alice and Bob are unmatched by M. 


180 
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Consider the example shown in Figure with three men x, y, and z, and 
three women a, b, and c. Their preference lists are: 
t:a>b>c, y:b>c>a, z:a>c>b, 
a:y>z>a, biy>z>a2, ci:u>y>z. 


Then, xz & a, y & b, z + cis an unstable matching since z and a prefer each other 
to their partners. 


x J i ä A 
fa” 
i j" atti 


FIGURE 10.2. The figure shows three men and three women and their prefer- 
ence lists. For example, the green man y prefers b to c to a. 


In the next section, we show that stable matchings exist for any preference 
profile and present an efficient algorithm for finding such a matching. 


10.2. Algorithms for finding stable matchings 


The following algorithm, called the men-proposing algorithm, was intro- 
duced by Gale and Shapley. At any point in time, there is some number of tenta- 
tively matched pairs. 


(1) Initially all men and women are unmatched. 

(2) Each man proposes to his most preferred woman who has not rejected 
him yet (or gives up if he’s been rejected by all women). 

(3) Each woman is tentatively matched to her favorite among her proposers 
and rejects the rest. 

(4) Repeat steps (2) and (3) until a round in which there are no rejections. 
At that point the tentative matches become final. 


REMARK 10.2.1. Note that if a man is tentatively matched to a woman in 
round k and the algorithm doesn’t terminate, then he necessarily proposes to her 
again in round k + 1. 


OBSERVATION 10.2.2. From the first time a woman is proposed to, she remains 
tentatively matched (and is permanently matched at the end). Moreover, each ten- 
tative match is at least as good as the previous one from her perspective. 


THEOREM 10.2.3. The men-proposing algorithm yields a stable matching. 
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FIGURE 10.3. Arrows in- FIGURE 10.4. Stable 


dicate proposals; cross in- 
dicates rejection. 


matching is achieved in 
the second stage. 


ProoF. The algorithm terminates because in every nonfinal round there is a 
rejection, and there are at most nm rejections possible. When it terminates, it 
clearly yields a matching, which we denote by M. To see that M is stable, consider 
aman, Bob, and a woman, Alice, not matched to each other, such that Bob prefers 
Alice to his match in M or Bob is single. This means that he was rejected by Alice 
at some point before the algorithm terminated. But then, by Observation [10.2.2] 
Alice is matched in M and prefers her match in M to Bob. 


COROLLARY 10.2.4. In the case n = m, the stable matching is perfect; that 
is, all men and women are matched. 


REMARK 10.2.5. We could similarly define a women-proposing algorithm. 


10.3. Properties of stable matchings 


We say a woman j is attainable for a man i if there exists a stable matching 
M with M(t) =j. 


THEOREM 10.3.1. Let M be the stable matching produced by the men-proposing 
algorithm. Then 


(a) Every man is matched in M to his most preferred attainable woman. 
(b) Every woman is matched in M to her least preferred attainable man. 


PROOF. We prove (a) by contradiction. Suppose that M does not match each 
man with his most preferred attainable woman. Consider the first time during the 
execution of the algorithm that a man, say Bob, is rejected by his most preferred 
attainable woman, Alice, and suppose that Alice rejects Bob at that moment for 
David. Since this is the first time a man is rejected by his most preferred attainable 
woman, 


David likes Alice at least as much as his most preferred attainable woman. 
(10.1) 


Also, since Alice is Bob’s most preferred attainable women, there is another 
stable matching M’ in which they are matched. In M’, David is matched to someone 
other than Alice. But now we have derived a contradiction: By (10.1), David likes 
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During execution of 
men-proposing algorithm 


first rejection by 
favorite attainable ! 


Bob / Alice Bob Alice 


M 
@——O 
Q 
Q 
Q 
p” ae 


David lice elects Bob David Claire 
& becomes tentatively 


matched with David 


FIGURE 10.5. This figure shows the contradiction that results from assuming 
that some man, in this case, Bob, is the first to be rejected by his favorite 
attainable woman, Alice, when running the men-proposing algorithm. M’ is 
the stable matching in which Bob and Alice are matched. 


M (result of men-proposing M 
algorithm) Carol 


Alice is Bob's most 
preferred attainable 


Bob Alice Bob Alice 
— 
o——__® O Q QQ 
- 
- 
- 
- 
- oa 
- 
- Dan is Alice's 
@ least preferred 
Attainable 
Dan Dan 


FIGURE 10.6. This figure shows the contradiction that results from assuming 
that in the men-proposing algorithm some woman, Alice, does not end up 
with her least preferred attainable, in this case Dan. M is the matching in 
which Alice is matched to Dan. 


Alice more than his match in M’, and Alice prefers David to Bob. Thus M’ is 
unstable. (See Figure [10.5}) 

We also prove part (b) by contradiction. Suppose that in M, Alice ends up 
matched to Bob, whom she prefers over her least preferred attainable man, Dan. 
Then, there is another stable matching M in which Dan and Alice are matched, 
and Bob is matched to a different woman, Carol. Then in M, Alice and Bob are an 
unstable pair: By part (a), in M, Bob is matched to his most preferred attainable 
woman. Thus, Bob prefers Alice to Carol, and by assumption Alice prefers Bob to 


Dan. (See Figure [10.6}) 


COROLLARY 10.3.2. If Alice is assigned to the same man in both the men- 
proposing and the women-proposing version of the algorithm, then this is the only 
attainable man for her. 


COROLLARY 10.3.3. The set of women (and men) who get matched is the same 
in all stable matchings. 
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PROOF. Consider the set of women matched by M, the matching resulting 
from the men-proposing algorithm. Suppose that one of these women, say Alice, 
is unmatched in some stable matching M. Then Bob, whom she was matched to 
in M, prefers her to whomever he is matched to in M, a contradiction. Since the 
number of matched women in both matchings is the same, namely min(n, m), this 
concludes the proof. 


10.3.1. Preferences by compatibility. Suppose we seek stable matchings 
for n men and n women with preference order determined by a matrix A = (ai j)nxn 
where all entries in each row are distinct and all entries in each column are distinct. 
If in the i*” row of the matrix we have 


Qi ji > Qija > > Qi jns 
then the preference order of man i is jı > j2 > -+-+ > jn. Similarly, if in the j*® 
column we have 

Qij > Digg >'t > Ving, 
then the preference order of woman j is i; > i2 > -- > in. (Imagine that the 
number a;j represents the compatibility of man i and woman j.) 


mA x EAEN S owe et CaS If 
a ERAEN, 
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FIGURE 10.7. The left-hand figure shows a stable matching between red points 
(ai)? and blue points (yj)? randomly placed on a torus. Preferences are 
according to distance; the shorter the better. Thus, aij = M — dist(ai, yj). 
The right-hand figure shows the minimum weight matching between the red 
points and the blue points. 


LEMMA 10.3.4. In this case, there exists a unique stable matching. 


ProoF. By|Theorem 10.3.1} we know that the men-proposing algorithm pro- 


duces a stable matching M in which each man obtains his most preferred attainable 
partner. In all other stable matchings, each man obtains at most the same value 
and at least one man obtains a lower value. Therefore, M is the unique maximizer 
of $; a, m(i) among all stable matchings. Similarly, the women-proposing algo- 
rithm produces a stable matching which maximizes ` j OM-1(j),j among all stable 
matchings. Thus, the stable matchings produced by the two algorithms are the 


same. By|Corollary 10.3.2} there exists a unique stable matching. 
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10.3.2. Truthfulness. {Exercise 10.3] shows that if the men-proposing algo- 
rithm is implemented, a woman might benefit by misrepresenting her preferences. 


Our next goal is to show that in this setting, no man is incentivized to do so. 


LEMMA 10.3.5. Let u be the men-optimal stable matching and let v be another 
matching. Denote by S the set of men who prefer their match in v to their match 
in u, i.e., 

S := {m| v(m) >m u(m)}. (10.2) 
Then there is a pair (m,w) which is unstable for v, where m ¢ S. 


PROOF. We consider the execution of the men-proposing algorithm that gen- 
erates u. 

Case 1: u(S) 4 v(S) (i.e., the set of women matched to men in S is not the 
same in p and v): Let w € v(S) \ u(S) and let m = p(w). Then (m, w) is unstable 
for v: First, m ¢ S, so m prefers w to v(m). Second, w rejected v(w) during the 
execution of the algorithm, so w prefers m to v(w). 

Case 2: u(S) = v(S) = Wo: Every woman in Wo receives and rejects a proposal 
from her match in v. Let w be the last woman in Wọ to receive a proposal from a 
man in S. Then w was tentatively matched to a man, say m, when she received this 
last proposal. Observe that m ¢ S; otherwise, at some point after being rejected 
by w, he would have proposed to u(m) € Wo, resulting in a later proposal in Wo. 
We claim (m,w) is unstable for v: First, w >m u(m) >m v(m). Second, v(w) 
proposed to w and was rejected earlier than m, so w prefers m to v(w). 


Men Women 


| 


Case 1 Case 2 


FIGURE 10.8. Red edges are in the male-optimal matching u and blue edges 
are in the matching v. 


COROLLARY 10.3.6. Let u denote the men-optimal stable matching. Suppose 
that a set So of men misrepresent their preferences. Then there is no stable match- 
ing for the resulting preference profile where all men in So obtain strictly better 
matches than in u (according to their original preference order). 


2 that arises from the men-proposing algorithm 
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PROOF. Suppose that v was such a matching. Then S as defined in|(10.2)|con- 


tains Sg. The pair (m, w) produced in/Lemma 10.3.5]is unstable for both preference 


profiles because m ¢ S. 


10.4. Trading agents 


The theory of stable matching concerns two-sided markets where decisions are 
made by both sides. Matching and allocation problems also arise in one-sided mar- 
kets, for instance, workers trading shifts or teams trading players. For concreteness, 
we’ll use the example of first-year graduate students being assigned offices. 

Consider a set of n grad students, each initially assigned a distinct office when 
they arrive at graduate school. Each student has a total order over the offices. 
Two people who prefer each other’s office would naturally swap. More generally, 
any permutation 7 : [n] — [n] defines an allocation where person i receives office 
m(i) (i.e., the office originally assigned to person 7(i)). Such an allocation is called 
unstable if there is a nonempty subset A C [n] and a permutation o : A > A 
(that is not identical to m on A) such that for each i € A where o(i) 4 n(i), person 
i prefers o(i) to m(i). Otherwise, m is stable. 

Is there always a stable allocation and, if so, how do we find it? The following 
top trading cycle algorithm finds such a stable allocation: 


Define Sp inductively as follows: 
Let Sı = |n]. For each k > 1, as long as Sẹ #0: 

e Let each person i € Sx point to her most preferred office in Sk, denoted 
fri). 

e The resulting directed graph (with a vertex for each person and an edge 
from vertex i to vertex j if 7’s office is 2’s favorite in Sp) has one outgoing 
edge (it could be a self-loop) from each vertex, so it must contain directed 
cycles. Call their union Cy. Allocate according to these cycles; i.e., for 
each i € Cy, set m(t) = fk(i). 

o Set Sk+1 = Sk \ Ck. 


LEMMA 10.4.1. 
(1) The top trading cycle algorithm produces a stable allocation r. 
(2) No person has an incentive to misreport her preferences. 


PROOF. (ip: Fix a subset A of students and a permutation o : A —> A that 
defines an instability. Let Ay = {i € A : o(t) € m(2)}, and let k be minimal such 
that there is a j € Ck N Ai. Then o(j) E Sk, so j prefers 1(7) to olj). 

(2): Fix the reports of all people except person £, and suppose that £ is in Cj. 
For any j < i all people in C} prefer their assigned office to office £. Thus, person £ 
cannot obtain an office in U;<;C; by misreporting her preferences. Since l prefers 
her assigned office 7(@) to all the remaining offices 5;, this concludes the proof. 


REMARK 10.4.2. There is a unique stable allocation. See Exercise}10.13 


Notes 


The stable matching problem was introduced and solved in a seminal paper by David 
Gale and Lloyd Shapley |GS62}, though stable matching algorithms were developed and 
used as early as 1951 to match interns to hospitals [Sta53]. The shortest proof of the 
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Round | 


Round II 


Round Ill 


FIGURE 10.9. The figure shows the first few rounds of the algorithm. 


existence of stable marriages is due to Marilda Sotomayor |Sot96|. See Exercise The 
results on truthfulness in were discovered by and independently by ; 
The proof we present is ese Roth gives an example showing that there is 
no mechanism to select a stable matching which incentivizes all participants to be truthful. 
Dubins and Freedman present an example due to Gale where two men can falsify 
their preferences so that in the men-proposing algorithm, one will end up better off and 
the other’s match will not change. 


David Gale Lloyd Shapley 


See the books by Roth and Sotomayor [R592], Gusfield and Irving [GI89], and 
Knuth for more information on the topic. For many examples of stable matching 
in the real world, see [Rot15]. 

The trading agents problem was introduced by Shapley and Scarf [S574]. They at- 
tribute the top trading cycle algorithm to David Gale. 
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For a broader survey of these topics, see Chapter 10] by Schummer and Vohra. 


Alvin Roth Marilda Sotomayor 


We learned about |Exercise 10.9] from Alexander Holroyd. |Exercise 10.11]is due to 
Exercise 10.12 


Al Roth [Rot86]. [Exercise 10.12]is from [RRVV93]. The idea of using hearts in pictorial 
depictions of stable matching algorithms (as we have done in[Figure 10.5|and[Figure 10.6} 
is due to Stephen Rudich. 

[Figure 10.7]is from [HPPS09], where the distribution of distances for stable matchings 
on the torus is studied. An extension to stable allocation (see[Figure 10.10} was analyzed 
in [HEPOO). 


FIGURE 10.10. Given n random points on the torus, there is a unique sta- 
ble allocation that assigns each point the same area where preferences are 
according to distance. 


Exercises 


10.1. There are three men, called a, b,c, and three women, called x,y,z, with the 
following preference lists (most preferred on left): 


fora: xt>y>z, form: c>b>a, 
forb: y>au>z, fory: a>b>c, 
fore: y>au>z, forz: c>a>b. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


EXERCISES 189 


Find the stable matchings that will be produced by the men-[roposing and 
by the women-proposing algorithm. 


10.2. Consider an instance of the stable matching problem, and suppose that M 
and M’ are two distinct stable matchings. Show that the men who prefer 
their match in M to their match in M’ are matched in M to women who 
prefer their match in M’ to their match in M. 


10.3. Give an instance of the stable matching problem in which, by lying about 
her preferences during the execution of the men-proposing algorithm, a 
woman can end up with a man that she prefers over the man she would 
have ended up with had she told the truth. 


10.4. Consider a stable matching instance with n men and n women. Show that 
there is no matching (stable or not) that all men prefer to the male-optimal 
stable matching. Hint: In the men-proposing algorithm, consider the last 
woman to receive an offer. 


10.5. Consider the setting of where the preferences are determined by 
a matrix. The Greedy Algorithm for finding a matching chooses the (i, 7) 
for which a;; is maximum, matches woman i to man j, removes row 7 and 
column j from the matrix, and repeats inductively on the resulting matrix. 
Show that the Greedy Algorithm finds the unique stable matching in this 
setting. Show also that the resulting stable matching is not necessarily a 
maximum weight matching. 


10.6. | Consider an instance of the stable matching problem with n men and m 
women. Define a (partial) matching to be simple if the only unstable pairs 
involve an unmatched man. There exist simple matchings (e.g., the empty 
matching). Given a matching M, let r; be the number of men that woman 
j prefers to her match in M (with r(j) = n if she is unmatched). Let M* 
be a simple matching with minimum ae r(j). Show that M™ is stable. 


10.7. Show that there is not necessarily a solution to the stable roommates 
problem: In this problem, there is a set of 2n people, each with a total 
preference order over all the remaining people. A matching of the people 
(each matched pair will become roommates) is stable if there is no pair of 
people that are not matched that prefer to be roommates with each other 
over their assigned roommate in the matching. 


10.8. Show that the Greedy Algorithm (defined in|Exercise 10.5) gives a solution 


to the stable roommates problem when preferences are given by a matrix. 
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10.9. Consider 2n points in the plane, n red and n blue. Alice and Bob play the 
following game, in which they alternate moves starting with Alice. Alice 
picks a point a, of either color, say red. Then Bob picks a point bı of the 
other color, in this case, blue. Then Alice picks a red point az, but this 
point must be closer to bı than a; is. They continue like this, alternating, 
with the requirement that the i*® point that Alice picks is closer to bj—1 
than a;_; was, and the it? point b; that Bob picks is closer to a; than 
bi—ı was. The first person who can no longer pick a point that is closer to 
the other one’s point loses. Show that the following strategy is a winning 
strategy for Bob: At each step pick the point b; that is matched to a; 
in the unique stable matching for the instance, where each point prefers 
points of the other color that are closer to it. 


10.10. Consider using stable matching in the National Resident Matching 
Program, for the problem of assigning medical students (as residents) to 
hospitals. In this setting, there are n hospitals and m students. Each 
hospital has a certain number of positions for residents, say k; for hospital 
i. Each hospital has a ranking of all the students, and each student has a 
ranking of all the hospitals. Given an assignment of students to hospitals, 
a pair (H,s) is unstable if hospital H prefers student s to one of its 
assigned students (or has an unfilled slot), and s prefers hospital H to his 
current assignment. Describe an algorithm for finding a stable assignment 
(e.g., by reducing it to the stable matching problem). 


10.11. In the setting of the previous problem, show that if hospital H has at least 
one unfilled slot, then the set of students assigned to H is the same in all 
stable assignments. 


10.12. Consider the following integer programming’ formulation of the stable 
matching problem. To describe the program, we use the following notation. 
Let m be a particular man and w a particular women. Then j >m w repre- 
sents the set of all women j that m prefers over w, and i >w m represents 
the set of all men 7 that w prefers over m. In the following program the 
variable x;; will be selected to be 1 if man 7 and woman j are matched in 
the matching selected: 


3 Integer programming is linear programming in which the variables are required to take 
integer values. See]|Appendix A|for an introduction to linear programming. 
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maximize 5 Tij 
a,j 
subject to 5 Lm,j <1 for all men m, (10.1) 
j 


` Zi,w <1 for all women w, 


l 
5 Lm,j + 5 Ziw +Lm,w > 1 for all pairs (m, w), 
j>mw i>wm 


Lm,w € {0,1} for all pairs (m, w). 


e Prove that this integer program is a correct formulation of the stable 
matching problem. 

e Consider the relaxation of the integer program that allows fractional 
stable matchings. It is identical to the above program, except that 
instead of each £m,w being either 0 or 1, £m,w is allowed to take 
any real value in [0,1]. Show that the following program is the dual 
program to the relaxation of (10.1). 


re 3 "y Ba. "y 
minimize > Qi + > By — > Vij 
i j i,j 


subject to Am T Bw = 5 Ym, j ~~ 5 Yi,w — Ym, w 2 1 


J<mw i<wm 


for all pairs (m, w) 


Qi, Bj, Jij Z 0 for all i and j. 


e Use complementary slackness to show that every feasible fractional 
solution to the relaxation of (10.1) is optimal and that setting 


Am = J Zm,j for all m, 
j 


By= 5 Xiw for all w, 


and 
Vig = Tij for all a9 


is optimal for the dual program. 


10.13. Show that there is a unique stable allocation in the sense discussed in 
Section Hint: Use a proof by contradiction, considering the first Cp 
in the top trading cycle algorithm where the allocations differ. 
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10.14. Consider a set of n teams, each with 10 players, where each team owner 
has a ranking of all 10n players. Define a notion of stable allocation in this 
setting (as in Section |10.4) and show how to adapt the top trading cycle 
algorithm to find a stable allocation. We assume that players’ preferences 
play no role. 


10.15. A weaker notion of instability than the one discussed in Section re- 
quires that no set of graduate students can obtain better offices than they 
are assigned in m by reallocating among themselves the offices allocated to 
them in m. Show that this follows from stability as defined in Section [10.4] 
Note that the converse does not hold. For example, if there are two people 
who both prefer the same office, the only stable allocation is to give that 
office to its owner, but the alternative is also weakly stable. 
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CHAPTER 11 


Fair division 


A Jewish town had a shortage of men for wedding purposes, so 
they had to import men from other towns. One day a groom-to- 
be arrived on a train. As he disembarked, one lady proclaimed, 
“He’s a perfect fit for my daughter!” Another lady disagreed, 
“No, he’s a much better fit for my daughter!” 

A rabbi was called to decide the matter. After hearing both 
ladies, he said, “Each of you has good arguments for why your 
daughter should be the one to marry this man. Let’s cut him 
in two and give each of your daughters half of him.” One of 
the ladies replied, “That sounds fair.” The rabbi immediately 
declared, “That’s the real mother-in-law!” 


Suppose that several people need to divide an asset, such as a plot of land, 
between them. One person may assign a higher value to a portion of the asset than 
another. This is often illustrated with the example of dividing a cake. 


11.1. Cake cutting 


FIGURE 11.1. Two bears sharing a cake. One cuts; the other chooses. 


193 
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The classical method for dividing a cake fairly between two people is to have 
one cut and the other choose. This method ensures that each player can get at 
least half the cake according to his preferences; e.g., a player who loves icing most 
will take care to divide the icing equally between the two pieces. 


ap ap nt 


Tı T2 T3 T4 T5 


FIGURE 11.2. This figure shows a possible way to cut a cake into five pieces. 
The it piece is B; = [$i] ae, } i] £k). If the it piece goes to player j 
(i.e., Aj := Bi), then his value for this piece is u;(B:). 


To divide a cake between more than two players, we first model the cake as 
the unit interval and assume that for each i € {1,...,n}, there is a distribution 
function F;(x) representing player i’s value for the interval [0,2]. (See Figure [11.2] 
for a possible partition of the cake.) We assume these functions are continuous. Let 
}t;(A) be the value player i assigns to the set A C [0,1]; in particular, j;({a, b]) = 
F;(b) — F;(a). We assume that u; is a probability measure. 


DEFINITION 11.1.1. A partition A,,...,A, of the unit interval is called a fair 
division if uil Ai) > 1/n. A crucial issue is which sets are allowed in the partition. 
For now, we assume that each A; is an interval. 


REMARK 11.1.2. The assumption that F; is continuous is key since a disconti- 
nuity would represent an atom in the cake, and might preclude fair division. 


Moving-knife Algorithm for fair division of a cake among n people 
e Move a knife continuously over the cake from left to right 
until some player yells “Stop!” 
e Give that player the piece of cake to the left of the knife. 
e Iterate with the other n— 1 players and the remaining cake. 


DEFINITION 11.1.3. The safe strategy for a player i is defined inductively as 
follows. If n = 1, take the whole cake. Otherwise, in the first round, 7 should yell 
“Stop” as soon as a 1/n portion of the cake is reached according to his measure. 
If someone else yells first, player i employs the safe strategy in the (n — 1)-person 
game on the remaining cake. 


LEMMA 11.1.4. Any player who plays the safe strategy is guaranteed to get a 
piece of cake that is worth at least 1/n of his value for the entire cake. 


PROOF. Any player i who plays the safe strategy either receives a piece of cake 
worth 1/n of his value in the first round or has value at least (n — 1)/n for the 
remaining cake. In the latter case, by induction, i receives at least 1/(n — 1) of 
his value for the remaining cake and hence at least 1/n of his value for the whole 
cake. 


1 This is also known as proportional. 
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The Cake: 


value to player | 


w|= 
w|N 


value to player II 


value to player III 


FIGURE 11.3. This figure shows an example of how the Moving-knife Algo- 
rithm might evolve with three players. The knife moves from left to right. 
Player I takes the first piece, then II, then HI. In the end, player I is envious 
of player III. 


While this cake-cutting algorithm guarantees a fair division if all participants 
play the safe strategy, it is not envy-free. It could be, when all is said and done, 
that some player would prefer the piece someone else received. See Figure [11.3] for 
an example. 


11.1.1. Cake cutting via Sperner’s Lemma. Let j11,..., Hn and F1,..., Fn 
be as above. In this section, we will show that there is a partition of the cake [0, 1] 
into n intervals that is envy-free, and hence fair, under the following assumption. 


ASSUMPTION 11.1.5. Each of the n people prefers any piece of cake to no piece; 
i.e., 4i(A) > 0 for all i and any interval A # 0. 


We start by presenting an algorithm that constructs an e-envy-free partition. 


DEFINITION 11.1.6. A partition A,,...,A, is e-envy-free if for all i, j we have 
pi( Aj) < wi(Ai) +E. 


This means that player i, who was assigned interval A;, does not prefer any 
other piece by more than e. 

11.1.1.1. The construction. Let e; denote the i*” standard basis vector and let 
A(e1,€2,...,€n) be the convex hull of e1,...,€n. Each point (£1,..., £n) in the 
simplex A (e1, €2,..., €n) describes a partition of the cake (see Figure [11.2) where 
A; is the piece of cake allocated to player i. 

By [Lemma 5.4.5} for any 7 > 0, there is a subdivision I of A(e1,e2,..., €n) 
for which all simplices in I have diameter less than 7. By [Corollary 5.4.6} there is 
a proper-coloring [P] with colors {c1,...,Cn} of the vertices of I. If a vertex v has 
color c;, we will say that player i owns that vertex. See Figure [11.4] 

Next, construct a Sperner labeling ¢(-) of the vertices in the subdivision as fol- 
lows: Given a vertex x = (#1,...,%n) in T, define B; = B;(x) = [Pi] we, Xi r]. 
(Again, see Figure [11.2}) If x is owned by player j and y;(B,) is maximal among 
;(Bi),...,4;(Bn), then €(x) = k. In other words, (x) = k if Bk is player j’s 
favorite piece among the pieces defined by x. The fact that ¢(-) is a valid Sperner 


labeling follows from Assumption |11.1.5} See Figure 


2 A coloring is proper if any two vertices in the same simplex A; €T are assigned different 
colors. 
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FIGURE 11.4. This picture shows the coloring of a subdivision [ for three 
players. Each simplex in I has one black vertex (owned by player I), one 
purple vertex (owned by player II), and one green vertex (owned by player 
III). 


(1,0,0) _ label1 
e 


vertices on 
this side 
have 7 = (x1,0, £3) 


a a. 


labels 2 or 3 


FIGURE 11.5. The coordinates x = (£1, £2, £3) of a vertex represent a possible 
partition of the cake. The Sperner label of a vertex owned by a particular 
player is the index of the piece that player would choose given that partition 
of the cake. Notice that by Assumption [L115] if, say, x; = 0, then the label 
of vertex x is not i. 


Finally, we apply Sperner’s Lemma, from which we conclude that there is a 
fully labeled simplex in T. 


THEOREM 11.1.7. Lete > 0. There exists n such that if the maximal diameter 
of all simplices in T is less than n, then any vertex x of a fully labeled simplex in T 
determines an €-envy-free partition. 


PROOF. Let A* = A(v1,..., Vn) bea fully labeled simplex in T, with v; owned 
by player i. Let x := vı determine the partition B1,..., Bn. Write 7(k) := (vz). 
The fact that A* is fully labeled means that 7 is a permutation. For every k, assign 
player k the piece Aj := B(x). This will be the e-envy-free partition (provided n 
is sufficiently small). 

Clearly A; is the piece preferred by player i. Given another player j, use v; to 
construct a partition B{,..., Bi. Observe that, for all k, the endpoints of By, are 
within nny of the endpoints of Bp. By uniform continuity of F;., there is 6 > 0 such 
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that 

€ 
7 
Thus, we will choose 7 = 6/n, from which we can conclude 


(A 
Ij (By) — u; (Br) < 5 


t= t| <6 => |F,(t) — Ft| < 


for all k. Since 
bj (Bay) = wi (Bray) 
by the triangle inequality, 
Mj (Bay) 2 My ( Bacay) — €- 
The last part of the proof is illustrated in|Figure 11.6 


owned by green player, owned by purple player, 
player 3 player 1 
n M A a 
V3 = (11, £3, fg) vı =x =(2,, $, x5) 
(v3) = 3 = 1(3) fv)= 2 = 4(1) 


<— owned by black player, 


_ Aan player 2 
Vo = (@, T3, £3) 
E(vy) = 1 = 7(2) 
A, A3 
partition defined ee 
pi , B Bz B; 
partition defined e 
by V2 Gw a a 
Bi B, B; 


FIGURE 11.6. This figure illustrates the last part of the proof of 
The big simplex at the top is a blown up version of A*, the 
fully labeled simplex that is used to show that there is an e-envy-free parti- 
tion. In this figure, the distances between the vertices vı, v2 and v3 are at 
most 7. The partition is determined by vı = (£1, £2, £3). 


COROLLARY 11.1.8. If for each i the distribution function F; defining player i’s 
values is strictly increasing and continuous, then there exists an envy-free partition 
into intervals. 


PRooF. By the continuity of the F;’s, for every permutation 7 the set 


Arle) ={x EA : (Byciy(x),---, Brn) (x)) is e-envy-free} 
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is closed. The theorem shows that 
Ae) = L Axle) 
TES 


is closed, nonempty, and monotone decreasing as e€ | 0. Thus, 


x* € (NA (1/k) so Vk Imp st. x* € Am, (1/k). 
k 


Finally, since some m € Sn repeats infinitely often in {mk }k>1, the partition A; := 
By i) (x") is envy-free. 


11.2. Bankruptcy 


A debtor goes bankrupt. His total assets are less than his total debts. How 
should his assets be divided among his creditors? 


DEFINITION 11.2.1. A bankruptcy problem is defined by the total available 
assets A and the claims c1,...,Cn of the creditors, where c; is the claim of the 
it! creditor, with C := $; ci > A. A solution to the bankruptcy problem is an 
allocation a; to each creditor, where a; < c; and ); a; = A. 


One natural solution is proportional division, where each creditor receives 
the same fraction A/C of their claim; i.e., a; = c; A/C. 

However, consider a problem of partitioning a garment] between two people, 
one claiming half the garment and the other claiming the entire garment. Since 
only half of the garment is contested, it’s reasonable to partition that half between 
the two claimants, and assign the uncontested half to the second claimant. For 
A=1,c = 0.5, and c2 = 1, this yields ay = 0.25 and a2 = 0.75. This solution was 
proposed in the Talmud, an ancient Jewish text. 


FIGURE 11.7. The question of how to split a talit is discussed in Tractate 
Baba Metzia, Chapter 1, Mishnah 1. 


3 ora plot of land. 
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ay 


A 
p 


e 
H = o) =j — C2 —| 


FIGURE 11.8. This picture shows the partitioning proposed by the Talmud. 


FIGURE 11.9. Three widows in dispute over how the estate of their husband 
should be divided among them. 


Formally, this is the principle of equal division of contested amounts, 
which we will refer to as the garment rule for n = 2: Since (A — c)+ is not 
contested by 1 and (A — c2) is not contested by 2, each claimant receives his 
“uncontested portion” and half of the contested portion. See [Figure 11.8] 

The Talmud also presents solutions, without explanation, to the 3-claimant 
scenario shown in [Table 1] 

An explanation for the numbers in this table remained a conundrum for over 
1,500 years. To address this, let’s explore other fairness principles. 


(1) Constrained equal allocations (CEA): Allocate the same amount to 
each creditor up to his claim; i.e., a; = a ^ ci, where a is chosen so that 
X ai = A. (Recall that a A b = min(a, b).) 
(2) Constrained equal losses (CEL): Assign to each creditor the same loss 
li := ci — a; up to his claim; i.e., 4; = LA c; where 50,4, = C — A. 
Neither of these principles yields the Talmud allocations (see Table[2), but they 
both share a consistency property, which will be the key to solving the puzzle. 
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TABLE 1. This table presents the solution proposed in the Talmud for parti- 
tioning the estate of a man who has died among his three wives. 


Creditors’ claims: | 100 | 200 | 300 


Estate Size: 


1 1 1 
100 331 | 33 | 332 
200 50 |75 |75 
300 50 |100 |150 


Li | bo | fn | & | bs 


CEL 


FIGURE 11.10. An example illustrating constrained equal allocations (CEA) 
and constrained equal losses (CEL). The shaded area is the total assets A. 


DEFINITION 11.2.2. An allocation ruld’lis a function F mapping bankruptcy 
problems (ci,...,¢n;A), for arbitrary n, to solutions (a1,...,@,). Such a rule is 
called pairwise consistent if 


F(c1,...,Cn; A) = (a1,...,@n) implies that Vi Æ j, F(ci, cj; ai + aj) = (ai, aj). 


More generally, an allocation rule is consistent if for any subset S C [1,n], the 
total allocation J`;eg a; is split by F exactly to (ai)ies. 


EXERCISE 11.a. Verify that proportional division, constrained equal alloca- 
tions, and constrained equal losses are all consistent. Also, show that these alloca- 
tion rules are monotone: For a fixed set of claims (c1,..., Cn), the allocation a; to 
each claimant is monotone in the available assets A. 


THEOREM 11.2.3. There is a unique pairwise consistent rule T(c1,...,Cnj;A) 
(the Talmud rule) which reduces to the garment rule for two creditors. This rule 
is: 

e If A< C/2, then a; =aA $Ẹ4 witha chosen so that X; ai = A. Le., 
T(c1,.. Cn; A) := CEA (2... 2:4) ; 
2 2 
4 This is for bankruptcy problems. 
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TABLE 2. The allocations of Proportional Division, Constrained Equal Allo- 
cations (CEA), and Constrained Equal Losses (CEL) for the scenario shown 
in [Table 1 


Creditors’ claims: | 100 | 200 | 300 


Estate Size: 


Proportional | 162 | 334 | 50 
1 { 1 
100 CEA | 334 | 334 | 334 


CEL | 0 0 | 100 


Proportional | 334 | 662 | 100 


3 3 
200 CEA | 663 | 662 | 663 


CEL | 0 50 | 150 


Proportional | 50 | 100 | 150 
300 CEA | 100 | 100 | 100 


CEL | 0 | 100 | 200 


© If A>C/2, letli =LA% 


+, with L chosen so that X`; li = C — A, and set 
ay = Ci — li. Le., 


T(c1,---,€n; A) := CEL (2, S54). 


Moreover, the Talmud rule is consistent. 


PROOF OF [THEOREM 11.2.3} It follows from [Exercise 11.1} that the Talmud 
rule is the garment rule for n = 2. 
Consistency follows from the fact that if A < C/2, then J,es ai < Pics Ẹ for 


every S, so the Talmud rule applied to S is CEA with claims ($)ics. Consistency 


of the CEA rule (Exercise 11.a) completes the argument. A similar argument works 
for the case of A > C/2 using consistency of CEL. 


For uniqueness, suppose there are two different pairwise consistent rules that re- 
duce to the garment rule for n = 2. Then on some bankruptcy problem (c1, ... , Cn; A) 
they produce different allocations, say (a1,...,@n) and (b),...,0,). Since $0; a; = 
Xo; bi, there is a pair i,j with a; < b; and aj > bj. Without loss of generality 
suppose that a; + a; > bi + b;. Then the fact that a; < b; is a contradiction to the 
monotonicity in assets of the garment rule. 


REMARK 11.2.4. The proof of Theorem 11.2.3|shows that any monotone rule 


is uniquely determined by its restriction to pairs. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


202 11. FAIR DIVISION 


A 


FIGURE 11.11. A depiction of the Talmud rule in the two cases. 


REMARK 11.2.5. In this section, we have assumed that the claims are verifiable 
and not subject to manipulation by participants. 


Notes 


The divide and choose algorithm for cake cutting goes back to the Old Testament 
(Genesis 13): 


So Abraham said to Lot, Let us not have any quarreling between you 
and me, or between your herders and mine, for we are brothers. Is 
not the whole land before you? Let us part company. If you go to the 
left, I will go to the right; if you go to the right, I will go to the left. 
Lot looked and saw that the whole plain of the Jordan toward 
Zoar was well watered, like the garden of the Lord.... So Lot chose 
for himself the whole plain of the Jordan and set out toward the east. 


The Moving-knife Algorithm for cake cutting is due to Dubins and Spanier [DS61]. 
A discrete version of the Moving-knife Algorithm was discovered earlier by Banach and 
Knaster |Ste48]: The first player cuts a slice, and each of the other players, in turn, is 
given the opportunity to diminish it. The last diminisher gets the slice, and then the 
procedure is applied to the remaining n — 1 players. 

The use of Sperner’s Lemma to solve the envy-free cake cutting problem is due to 
Su [Su99]. 

In the setting of n nonatomic measureg”| Lyapunov showed that there always 
is a partition of the cake into n measurable slices Ai, A2,..., An, such that pi(A;) = 1/n 
for all ¿ and 7. In particular, there is an envy-free partition of the cake. Note though that 
even if the cake is [0,1], the resulting slices can be complicated measurable sets, and no 
algorithm is given to find them. An elegant proof of Lyapunov’s theorem was given by 
Lindenstrauss [Lin66]. 

Alon proved a theorem about “splitting necklaces”, which implies that if the 
cake is [0, 1], then a perfect partition as in Lyapunov’s Theorem can be obtained by cutting 
the cake into (n — 1)n + 1 intervals and assigning each participant a suitable subset of 
these intervals. 


5 A measure u(-) is nonatomic if (A) > 0 implies that there is B C A with 0 < p(B) < (A). 
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Lyapunov, Alon and Su’s theorems are non-constructive. Selfridge and Conway (see 
e.g., [BT96]) presented a constructive procedure for finding an envy-free partition when 
n = 3. In 1995, Brams and Taylor described a procedure that produces an envy- 
free partition for any n, but the number of steps it takes is finite but unbounded. Only in 
2016, Aziz and Mackenzie discovered a procedure whose complexity is bounded 
as a function of n. 

The resolution of the conundrum regarding bankruptcy is due to Aumann and Maschler 
[AM85]. Another rule, proposed by O’Neill [O’N82], is the random arrival rule: Consider 
an arrival order for the claimants, and allocate to each one the minimum of his claim 
and what is left of the estate. To make this fair, the final allocation is the average of 
these allocations over all n! orderings. For an extensive discussion of fair division, see the 
books by Brams and Taylor and Robertson and Webb and the survey by 
Procaccia [Pro13]. 


Exercise 11.2]/is from |BT96}. 


Exercises 


S 11.1. Show that the Talmud rule is monotone in A for all n and coincides with 
the garment rule for n = 2. 


11.2. Consider the following procedure to partition a round cake among three 
players: Alice, Barbara and Carol. We assume that each player has a 
continuous measure on the cake that allocates zero measure to every line 
segment (radius) that starts at the center of the cake. (Alternatively, these 
could be continuous measures on a circle.) 

e Alice positions three knives on the cake (like the hands of a clock). 
She then rotates them clockwise, with the requirement that if some 
knife reaches the initial position of the next knife, then the same must 
hold for the other two knives. (At all times the tips of the knives meet 
at the center of the cake). 

e Alice continues rotating the knives until either Barbara yells stop, or 
each knife has reached the initial position of the next knife. 

e When Alice stops rotating the knives, Carol chooses one of the three 
slices determined by the knives, and then Barbara selects one of the 
two remaining slices, leaving the last slice to Alice. 


Show that each player has a strategy ensuring that (no matter how 
the others play), the slice she obtains is at least as large as any other 
slice according to her measure. Thus if all three players adhere to these 
strategies, an envy-free partition will result. 

Hint: Alice should rotate the knives so that, at all times, the three 
slices they determine are of equal size according to her measure. Barbara 
should yell stop when the two largest slices (according to her measure) are 
tied. 
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Cooperative games 


In this chapter, we consider multiplayer games where players can form coali- 
tions. These come in two flavors: transferable utility (TU) games where side pay- 
ments are allowed, and nontransferable utility games (NTU). The latter includes 
settings where payments are not allowed (e.g., between voters and candidates in 
an election) or where players have different utility for money. In this chapter, we 
mostly focus on the former. 


12.1. Transferable utility games 


We review the example discussed in the preface. Suppose that three people 
are selling their wares in a market. One of them is selling a single, left-handed 
glove, while the other two are each selling a right-handed glove. A wealthy tourist 
arrives at the market in dire need of a pair of gloves, willing to pay one Bitcoirf|| 
for a pair of gloves. She refuses to deal with the glove-bearers individually, and 
thus, these sellers have to come to some agreement as to how to make a sale of a 
left- and right-handed glove to her and how to then split the one Bitcoin among 
themselves. Clearly, the first player has an advantage because his commodity is in 
scarcer supply. This means that he should be able to obtain a higher fraction of 
the payment than either of the other players. However, if he holds out for too high 
a fraction of the earnings, the other players may act as a coalition and require him 
to share more of the revenue. 

The question then is, in their negotiations prior to the purchase, how much can 
each player realistically demand out of the total payment made by the customer? 

To resolve this question, we introduce a characteristic function v, defined on 
subsets of the player set. In the glove market, v(S), where S is a subset of the three 
players, is 1 if, just among themselves, the players in S have both a left glove and 
a right glove. Thus, 


v(123) = v(12) = v(13) = 1, 
and the value is 0 on every other subset of {1,2,3}. (We abuse notation in this 
chapter and sometimes write v(12) instead of u({1, 2}), etc.) 

More generally, a cooperative game with transferable utilities is defined 
by a characteristic function v on subsets of the n players, where v : 2° > R is 
the value, or payoff, that subset S of players can achieve on their own regardless 
of what the remaining players do. This value can then be split among the players 
in any way that they agree on. The characteristic function satisfies the following 
properties: 

e v(@) =0. 
e Monotonicity: If S CT, then v(S) < v(T). 


l A Bitcoin is a unit of digital currency that was worth $100 at the time of the transaction. 
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FIGURE 12.1 


Given a characteristic function v, each possible outcome of the game is an 
allocation vector (v) € R”, where 7;(v) is the share of the payoff allocated to 
player i. 

What is a plausible outcome of the game? We will see several different solution 
concepts. 


12.2. The core 
An allocation vector % = w(v) is in the core if it satisfies the following two 
properties: 
e Efficiency: $`; Yi = v({1,...,n}). This means, by monotonicity, that, 
between them, the players extract the maximum possible total value. 


e Stability: Each coalition is allocated at least the payoff it can obtain on 
its own; i.e., for every set S, 


XO Hi = v(S). 
iE€S 


For the glove market, an allocation vector in the core must satisfy 


pı +y > L 
pı +Y > 1, 
pı +p +p =l. 


This system has only one solution: %ı = 1 and %2 = %3 = 0. g 


EXAMPLE 12.2.1 (Miners and Gold:). Consider a set of miners who have 
discovered large bars of gold. The value of the loot to the group is the number of 
bars that they can carry home. It takes two miners to carry one bar, and thus the 
value of the loot to any subset of k miners is |k/2]. 

If the total number of miners is even, then the vector % = (1/2,...,1/2) is in 
the core. On the other hand, if n is odd, say 3, then the core conditions require 
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that 
pı +p =], 
pı +3 = 1, 
p2 +43 2 1, 
Yityety3=1. 


This system has no solution. 


EXAMPLE 12.2.2 (Splitting a dollar:). A parent offers his two children $100 
if they can agree on how to split it. If they can’t agree, they will each get $10. In 
this case v(12) = 100, whereas v(1) = v(2) = 10. The core conditions require that 


pı = 10 w2210 and 14+ y~2 = 100, 
which clearly has multiple solutions. 


The drawback of the core, as we saw in these examples, is that it may be 
empty or it might contain multiple allocation vectors. This motivates us to consider 
alternative solution concepts. 


12.3. The Shapley value 


Another way to choose the allocation (v) is to adopt an axiomatic approach, 
wherein a set of desirable properties for the solution is enumerated. 


12.3.1. Shapley’s axioms. 
(1) Symmetry: If v(S U {i}) = v(S U {j}) for all S with i,j ¢ S, then 
Bilv) = U5 (v). 
(2) Dummy: A player that doesn’t add value gets nothing; i.e., if v(SU{i}) = 
v(S) for all S, then y;(v) = 0. 
(3) Efficiency: XY; vil) = v({1,...,n}). 
(4) Additivity: ~;(v + u) = pilv) + yilu). 
The first three axioms are self-explanatory. To motivate the additivity axiom, 
imagine the same players engage in two consecutive games, with characteristic 
functions v and u, respectively. This axiom states that the outcome in one game 
should not affect the other, and thus, in the combined game, the allocation to a 
player is the sum of his allocations in the component games. 
We shall see that there is a unique choice for the allocation vector, given these 
axioms. This unique choice for each ~;(v) is called the Shapley value of player i 
in the game defined by characteristic function v. 


EXAMPLE 12.3.1 (The S-veto game). Consider a coalitional game with n 
players, in which a fixed subset S of the players holds all the power. We will denote 
the characteristic function here by wg, defined as follows: wg(T) is 1 if T contains 
S and it is 0 otherwise. Suppose (vu) satisfies Shapley’s axioms. By the dummy 
axiom, 


bi(ws)=0 ifi¢ S. 
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Then, for i,j € S, the symmetry axiom gives y;(ws) = W;(ws). Finally, the 
efficiency axiom implies that 


1 


Similarly, we can derive that y;(cws) = cw;(wg) for any c € [0,00). Note that to 
derive this, we did not use the additivity axiom. 


Glove Market, again: We can now use our understanding of the S-veto game 
to solve for the Shapley values (v), where v is the characteristic function of the 
Glove Market game. 
Observe that for bits and for {0, 1}-valued functions 
uVw=max(u,w) =u +w-— u: w. 
With w12, etc, defined as in Example|12.3.1} we have that for every S, 
v(S) = w2(S) V wi3(S) = wi2(S) + wi3(S) = w123(S). 
Thus, the additivity axiom gives 
Pi(v) = Yi(wi2) + Pi(wis) — Yi(wigs). 
We conclude from this that #(v) = 1/2 + 1/2 — 1/3 = 2/3, whereas (v) = 
w3(v) = 0 + 1/2 — 1/3 = 1/6. Thus, under Shapley’s axioms, player 1 obtains a 
two-thirds share of the payoff, while players 2 and 3 equally share one-third between 
them. 


The calculation of w;(v) we just did for the glove game relied on the represen- 
tation of v as a linear combination of S-veto functions ws. Such a representation 
always exists. 


LEMMA 12.3.2. For any characteristic function v : 2l] R, there is a unique 
choice of coefficients cs such that 


v= y csWws. 
SAO 


PrRooF. The system of 2” — 1 equations in the 2” — 1 unknowns cg, that is, 
for all nonempty T C [n] 


o(T)= X` egws(T), (12.1) 
@ASC(n] 


has a unique solution. To see this, observe that if the subsets of [n] are ordered in 
increasing cardinality, then the matrix wgs(T) is upper triangular, with 1’s along 
the diagonal. For example, with n = 2, rows indexed by S and columns indexed by 
T, the matrix wg(T) is 


{1} {2} {12} 
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EXAMPLE 12.3.3 (Four Stockholders). Four people own stock in ACME. 
Player i holds i units of stock, for each i € {1,2,3,4}. Six shares are needed to 
pass a resolution at the board meeting. Here v($) is 1 if subset S of players have 
enough shares of stock among them to pass a resolution. Thus, 


1 = v(1234) = v(24) = v(34), 


while v = 1 on any 3-tuple and v = 0 in each other case. In this setting, the 
Shapley value w;(v) for player i represents the power of player i and is known as 


the Shapley-Shubik power index. By|Lemma 12.3.2} we know that the system 


of equations 
v= 5 Csws 
SED 

has a solution. Solving this system, we find that 

U = Wea + W34 + W123 — W234 — W1234, 
from which 

pı(v) = 1/3 — 1/4 = 1/12 

and 

palv) = 1/2 + 1/3 — 1/3 — 1/4 = 1/4, 
while w3(v) = 1/4, by symmetry with player 2. Finally, w4(v) = 5/12. It is 
interesting to note that the person with two shares and the person with three 
shares have equal power. 


EXERCISE 12.a. Show that Four Stockholders has no solution in the core. 


12.3.2. Shapley’s Theorem. Consider a fixed ordering of the players, de- 
fined by a permutation 7 of [n] = {1,...,n}. Imagine the players arriving one by 
one according to this permutation 7, and define ¢;(v,7) to be the marginal contri- 
bution of player i at the time of his arrival assuming players arrive in this order. 
That is, 

pilv, T) = u(r{l,...,k}) — v(r{l,...,4-—1}) where 1(k) =i. (12.2) 
Notice that if we were to set v;(v) = ¢;(v,7) for any fixed 7, the dummy, efficiency, 
and additivity axioms would be satisfied. 

To satisfy the symmetry axiom as well, we will instead imagine that the players 
arrive in a random order and define w;(v) to be the expected value of ¢;(v, 7) when 
m is chosen uniformly at random. 


REMARK 12.3.4. If v(-) is {0, 1}-valued, then w;(v) is the probability that player 
ts arrival converts a losing coalition to a winning coalition. 


THEOREM 12.3.5. Shapley’s four axioms uniquely determine the functions hi. 
They are given by the random arrival formula: 


pilv) =< X giv, 7). (12.3) 
` TESn 


REMARK 12.3.6. ¢;(v,7) depends on 7 only via the set S = {j : ma 1(j) < 
a 1(i)} of players that precede i in S. Therefore 


viv) = PMMA ISIN Dt acs fay) -as 


SCN\{i} 
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PROOF. First, we prove that the functions y;(v) are uniquely determined by 
v and the four axioms. By [Lemma 12.3.2] we know that any characteristic func- 
tion v can be uniquely represented as a linear combination of S-veto characteristic 
functions wg. 

Recalling that ¢;(wg) = 1/|S| if i € S and w;(wg) = 0 otherwise, we apply the 
additivity axiom and conclude that ~;(v) is uniquely determined: 


wo) = dil » csws) = X vilesws)= X T 
2A¢SCIn| oASCIn| sciaves Sl 


We complete the proof by showing that the specific values given in the statement 
of the theorem satisfy all of the axioms. Recall the definition of ¢;(v, 7) from (12.2). 
By averaging over all permutations 7 and then defining 7;(v) as in|(12.3)| we claim 
that all four axioms are satisfied. Since averaging preserves the dummy, efficiency, 
and additivity axioms, we only need to prove the intuitive fact that by averaging 
over all permutations, we obtain symmetry. 

To this end, suppose that i and j are such that 


v(S U {i}) = (SU {j} 


for all S C [n] with SN {i,j} = Ø. For every permutation 7, define m* to be the 
same as 7 except that the positions of i and j are switched. Then 


bi(v, T) = oj (v, 7"). 


Using the fact that the map 7 +> z* is a one-to-one map from Sn to itself for 
which 7** = m, we obtain 


Hie) == E bile = E gor) 


TESn ` TESn 


5 E gor) =p) 


TES, 


Therefore, Y(v) = (W1(v),...,Wn(v)) are indeed the unique Shapley values. 


12.3.3. Additional examples. 


EXAMPLE 12.3.7 (A fish with little intrinsic value). A seller s has a fish 
having little intrinsic value to him; i.e., he values it at $2. A buyer b values the 
fish at $10. Thus, v(s) = 2 and v(b) = 0. Denote by x the sale price. Then 
v(s,b) = x + (10 — x) = 10 for x > 2. 

In this game, any allocation of the form w,(v) = x and (v) = 10 — a (for 
2 < x < 10) is in the core. On the other hand, the Shapley values are w,(v) = 6 
and yp (v) = 4. 

Note, however, that the value of the fish to b and s is private information, and 
if the price is determined by the formula above, they would have an incentive to 
misreport their values. 


EXAMPLE 12.3.8 (Many right gloves). Consider the following variant of the 
glove game. There are n = r+ 2 players. Players 1 and 2 have left gloves. The 
remaining players each have a right glove. Thus, the characteristic function v(S) is 
the maximum number of proper and disjoint pairs of gloves owned by players in S. 
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We compute the Shapley value. Note that ~1(v) = ~o2(v) and that ~,(v) = 3(v) 
for each r > 3. By the efficiency axiom, we have 


2y (v) + ry3(v) = 2 
provided that r > 2. To determine the Shapley value of the third player, we consider 
all permutations 7 with the property that the third player adds value to the group 
of players that precede him in m. These are the following orders: 


13, 23, {1, 2}3, {1, 2, 7}8, 


where j is any value in {4,...,n} and the curly brackets mean that each permu- 
tation of the elements in curly brackets is included. The number of permutations 
corresponding to each of these possibilities is r!, r!, 2(r —1)!, and 6(r —1)- (r — 2)!. 
Thus, 
2r! + 8(r — 1)! 2r +8 
¥3(v) = = 
(r + 2)! (r+ 2)(r + 1)r 


12.4. Nash bargaining 


EXAMPLE 12.4.1. The owner of a house and a potential buyer are negotiating 
over the price. The house is worth one (million) dollars to the seller, but it is worth 
1+ (million) dollars to the buyer. Thus, any price p they could agree on must be 
in [1,1 + s]. However, the seller already has an offer of 1 + dı, and the buyer has 
an alternative house that she could buy (also worth 1+ s to her) for 1 + s — de. 
(Assume that dı +d2 < s.) If they come to agreement on a price p, then the utility 
to the buyer will be p — 1 and the utility to the seller will be 1+ s — p. If the 
negotiation breaks down, they can each accept their alternative offers, resulting in 
a utility of dı for the seller and dz for the buyer. At what price might we expect 
their bargaining to terminate? 


DEFINITION 12.4.2. A two-person bargaining problem is defined by a closed, 
bounded convex set S$ C R? and a point d = (di,d2). The set S represents the 
possible payoffs that the two players could potentially come to an agreement on, and 
d represents the payoffs that will result if they are unable to come to an agreement, 
sometimes called the disagreement point. We assume there is a point (21,22) € S 
such that xı > dı and x2 > dz and that x > d for all x € S (since no player will 
accept an outcome below his disagreement value). 


In Example[12.4.1| S = { (x1, £2) | t1 +2£2 < $, 11 > dı, x2 > d2} and d = (di, d2). 
DEFINITION 12.4.3. A solution to a bargaining problem is a mapping F 
that takes each instance (S, d) and outputs an agreement point 
F(S,d)=aeS. 
For a = (a1, @2), the final payoff for player I is a; and the final payoff for player I 
is a2. 


What constitutes a fair/reasonable solution? The approach taken by John Nash 
was to formulate a set of axioms that a fair solution should satisfy. These are known 
as the Nash bargaining axioms: 


e Affine covariance: Let 


W (21, £2) = (a1 21 + b1, 222 + b2). (12.4) 
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We say that F(-) is affine covariant if for any real a1, a2, 81, 82 with 
Q1,Q2 > 0 and bargaining problem (5, d), 


F(U(S), U(d)) = U(F(S, d)). 


e Pareto optimality: If F(S,d) =a, and if a' = (a4,a4) € S satisfies a > ay 
and ah > az, then a= a’. 

e Symmetry: For any bargaining problem (S,d) such that dı = d2 and 
(x,y) E€ S > (y, x) E€ S, we have F(S,d) = (a,a) for some (a,a) € S. 

e Independence of Irrelevant Alternatives (IIA): If (S,d) and ($’,d) are 
two bargaining problems such that S C S and if F(S’,d) € S, then 
F(S,d) = F(S’,d). 


DEFINITION 12.4.4. The Nash bargaining solution F” (S,d) = a = (a1, a2) 
is the solution to the following maximization problem: 
2 


maximize [[@ — dj) 
i=l 
subject to xı > dı, > də, 
(£1, £2) ES. (12.5) 


REMARK 12.4.5. The Nash bargaining solution F^ (-) always exists since S' is 
closed and bounded and contains a point with zı > dı and x2 > d2. Moreover, the 
solution is unique. To see this, without loss of generality, assume that dı = d2 = 0, 
and suppose there are two optimal solutions (x,y) and (w, z) to (12.5) with 


zT- Y=w:z=qQ. (12.6) 
Observe that the function f(t) = Ẹ is strictly convex and therefore 


a 


Using |(12.6)| this is equivalent to 


a ytz 
atw 2 s 
2 


maximization problem (12.5) because (242, #2) (also feasible due to the convexity 
of S) yields a larger product. 


This contradicts the assumption that (x,y) and (w, z) are optimal solutions to the 
t 


EXERCISE 12.b. Check that in Example}12.4.1} where 
S = { (z1, £2) | £1 + £2 < $, z1 > di, £2 > d2} and d = (dı, d2), 
the Nash bargaining solution and the Shapley values are both (+d -d2 stdg—dy )_ 


THEOREM 12.4.6. The solution FY (-) is the unique function satisfying the Nash 
bargaining axioms. 
PROOF. We first observe that FN (-) satisfies the axioms. 
e Affine covariance follows from the identity 


J [cia + bi) = (aidi + £i)] = [Tei It: — di). 


t 
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e To check Pareto optimality, observe that?| ify > x > d and y Æ x, then 
Ilui- di) > J; — di). 

e Symmetry: Let a = (a1,a2) = F^ (S,d) be the solution to (12.5), where 
(S,d) is symmetric. Then (a2, a1) is also an optimal solution to (12.5), 
so, by the uniqueness of the solution, we must have a; = a2. 

e ITA: Consider any S C S’. If FX(S’,d) € S, then it must be a solution 
to[(12.5)] in S. By uniqueness, it must coincide with F^ (S, d). 

Next we show that any F(-) that satisfies the axioms is equal to the Nash 
bargaining solution. We first prove this assuming that the bargaining problem that 
we are considering has disagreement point d = (0,0) and F (S,d) = (1,1). We 
will argue that this assumption, together with the symmetry axiom and IIA, imply 
that F(S,d) = (1,1). 

To this end, let S’ be the convex hull of S U {x7|x € S}. For every x € S, 
convexity implies that (1—A)(1, 1)+Ax € S, so (A) := (1—A+Az1)(1—A+Az2) < 1. 
Since (0) = 1, we infer that 0 > y’(0) = £1 + £2 — 2. (See also for an 
alternative argument.) 

Thus, S U {xT|x € S} C {x > 0 | zı +22 < 2}; the set on the right is 
convex, so it must contain S’ as well. Therefore (1,1) is Pareto optimal in S’ so 
the symmetry axiom yields F(5S’,(0,0)) = (1,1). Since (1,1) € S, the IIA axiom 
gives F(S, (0,0)) = (1,1). 


FIGURE 12.2. The line zı + x2 = 2 is tangent to the hyperbola x1 - z2 = 1 at 
(1,1). Therefore, any line segment from a point (a,b) with a+b > 2 to the 
point (1,1) must intersect the region zı - x2 > 1. 


Finally, we argue that F(S,d) = F^ (S,d) for an arbitrary bargaining problem 
(S,d) with F^ (S,d) =a. To this end, find an affine function Y as in such 
that U(d) = 0 and W(a) = (1,1). 

By the affine covariance axiom, 


U(F(S,d)) = F(W(S),0) = FY (Y(S), 0) = W(F*(S,d)), 


2 The vector notation y > x means that y; > x; for all i. 
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which means that 
F(S,d) = F” (S,d). 


Suppose that player i has strictly increasing utility U;(x;) for an allocation zi. 
Often it is assumed that these utility functions are oncad since the same gain is 
often worth less to a player when he is rich than when he is poor. 

When some players have non-linear utility functions, it is not reasonable to 
require the affine covariance axiom for monetary allocations. Rather we require 
affine covariance of the vector of utilities obtained by the players. The same con- 
siderations apply to the symmetry axiom. The other axioms (Pareto and IIA) are 
not affected by applying a monotone bijection to each allocation. 

Thus, we will seek the Nash bargaining solution in utility space. That is, we 
will apply Nash bargaining to the set Sy = {(U1 (x1), U2(x2)) | (£1, £2) € S} with 
disagreement point (U1 (di), U2(d2)). 


EXAMPLE 12.4.7. Consider two players that need to split a dollar between 
them. Suppose that player I is risk-neutral (Ui(a1) = xı) and the other has a 
strictly increasing, concave utility function; his utility for a payoff of z2 is U2(x2), 
where U2(0) = 0, U2(1) =I. 

The Nash bargaining solution is the maximum of U} (x1)U2(£2) = 21U2(x2) 
over {x > 0: 21+ 22 < 1}. Since U := U; is increasing, this reduces to maximizing 
f(a) := zU (1 — x) over x € [0,1]. Observe that for all z < 1/2, 


f(z) =U(1—2) — xU'(1— x) >0, 


and therefore f(x) is maximized at x > 1/2. In other words, at the Nash bargaining 
solution, the risk-neutral player gets more than half of the dollar, and the risk-averse 
player loses out. 

For example, with Uz(x2) = \/xz, the Nash bargaining solution is obtained by 
maximizing f(x) = xy1 — zv over [0,1]. The optimal choice is x = 2/3, which yields 
a Nash bargaining solution of (2/3, 1/3). 


Notes 


The notion of a cooperative game is due to von Neumann and Morgenstern [VNM53]. 
Many different approaches to defining allocation rules, i.e., the shares w(v), have been 
proposed, based on either stability (subgroups should not have an incentive to abandon 
the grand coalition) or fairness. The notion of core is due to Gilles [Gil59]. The definition 
and axiomatic characterization of the Shapley value is due to Shapley [Sha53b]. See 
also [Rot88]. Another index of the power of a player in a cooperative game where v(-) € 
{0,1} is due to Banzhaf [BI64). Reca ling Dentin 2 the Banzhaf index of player i 
is defined as I;(2v — 1)/ $; Jj(2u — 1). See Owen for more details. 

An important solution concept omitted from this chapter is the nucleolus, due to 
Schmeidler [Sch69|. For a possible allocation vector #(v) = (Y1,..., Yn) with 30, Yi = 
v([n]), define the excess of coalition S with respect to w to be e(S, p) := v(S) — Dies Vi, 
i.e., how unhappy coalition S is with the allocation w. Among all allocation vectors, 
consider the ones that minimize the largest excess. Of these, consider those that minimize 
the second largest excess, and so on. The resulting allocation is called the nucleolus. (Note 
that when core allocations exist, the nucleolus and core allocations coincide.) 


3A player with a concave utility function is called risk-averse because he prefers an allocation 
of (x + y)/2 to a lottery where he would receive x or y with probability 1/2 each. 
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Recall the setup of bankruptcy problems from [811.2] There is an associated coopera- 
tive game defined as follows: Given any set of creditors S, let v(S) := max(A->) igs ci, 0). 
The corresponding Shapley values coincide with O’Neill’s solution of the bankruptcy prob- 
lem [O’N82), and the nucleolus coincides with the Talmud rule [AM85|. For more on this 
topic and on cooperative games in general, see [MSZ13}. 


Lloyd Shapley John Nash 


The Nash bargaining solution is from |NJ50|. The IIA axiom is controversial. For 
instance in|Figure 12.3} player II might reasonably feel he is entitled to a higher allocation 
than player I in S’ than in S. 


3 II’s utopia (a,, a,) = (1, 2) 
KS 
Player solution 
Il (1, 1) r (1, 1) (1, 1) 
TPs utopia Ps utopia 
value a, value a, 
+> me 
í 5 T 2 (d,, d,) = (0,0) 4 2 


Player I 


FIGURE 12.3. In the leftmost figure, each player gets half his utopia value, the 
largest value he could obtain at any point in S. By IIA, in the middle figure,the 
Nash bargaining solution gives player I her utopia value but gives player II 
only half his utopia value. The rightmost figure shows the Kalai-Smorodinsky 
solution for this example. 


Kalai and Smorodinsky addressed this by proposing another solution. If a; := 
maxXzes zi is the utopia value for player i, then the Kalai-Smorodinsky (KS) bargaining 
solution F*S(S,d) is the point in S closest to a = (a1, a2) on the line connecting d with a. 
The KS solution satisfies the affine covariance, Pareto optimality, and symmetry axioms. 
While it doesn’t satisfy IIA, it does satisfy a monotonicity assumption: If S C T and 
a(S,d) = a(T,d), then F®5(S,d) < FSS(T,d). See Figure 12.3] 

Another criticism of the axiomatic approach to bargaining is that it doesn’t describe 
how the players will arrive at the solution. This was addressed by Rubinstein [Rub82], 


who described the process of bargaining as an extensive form game in which players take 
turns making offers and delays are costly. In the first round, player I makes an offer in S. 
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If it is accepted, the game ends. If not, then all outcomes (both S and d) are scaled by a 
factor of 1 — 6, and it is player II’s turn to make an offer. This continues until an offer is 


accepted. See|Exercise 12.2 
In his book [Leol0|, R. Leonard recounts the following story] about Shapley: John 


von Neumann, the preeminent mathematician of his day, was standing at the blackboard 
in the Rand Corporation offices explaining a complicated proof that a certain game had 
no solution. “No! No!” interjected a young voice from the back of the room, “That can 
be done much more simply!” 
You could have heard a pin drop. Even years later, Hans Speier remembered the 
moment: 
Now my heart stood still, because I wasn’t used to this sort of thing. 
Johnny von Neumann said, “Come up here, young man. Show me.” 
He goes up, takes the piece of chalk, and writes down another deriva- 
tion, and Johnny von Neumann interrupts and says, “Not so fast, 
young man. I can’t follow.” 
Now he was right; the young man was right. Johnny von Neu- 
mann, after this meeting, went to John Williams and said, “Who is 
this boy?” 


Exercises 


12.1. The glove market revisited. A proper pair of gloves consists of a left 
glove and a right glove. There are n players. Player 1 has two left gloves, 
while each of the other n — 1 players has one right glove. The payoff v($) 
for a coalition S is the number of proper pairs that can be formed from the 
gloves owned by the members of S. 

(a) For n = 3, determine v(S) for each of the seven nonempty sets 
S C {1,2,3}. Then find the Shapley value u;(v) for each of the players 
i7=1,2,3. 

(b) For a general n, find the Shapley value y;(v) for each of the n 
players i = 1,2,...,n. 


12.2. Rubinstein bargaining: Consider two players deciding how to split a 
cake of size 1. The players alternate making offers for how to split the cake 
until one of the players accepts the offer made by the other. Suppose also 
that there is a cost to delaying: If a player rejects the current offer by the 
other, the value of the cake goes down by a factor of 1 — ô. Consider a 
strategy profile where each player offers a fraction x of the cake (in his turn) 
and accepts no offer in which he receives a fraction less than x. Determine 
which values of x yield an equilibrium and which of these are subgame 
perfect. 


4 This story is adapted from page 294 in Leonard’s book. 
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CHAPTER 13 


Social choice and voting 


Suppose that the individuals in a society are presented with a list of alternatives 
(e.g., which movie to watch, or who to elect as president) and have to choose one 
of them. Can a selection be made so as to truly reflect the preferences of the 
individuals? What does it mean for a social choice to be fair? 

When there are only two options to choose from, majority rule can be ap- 
plied to yield an outcome that more than half of the individuals find satisfactory. 
When the number of options is three or more, pairwise contests may fail to yield 
a consistent ordering. This paradox, shown in [Figure 13.1] was first discovered by 
the Marquis de Condorcet in the late eighteenth century. 


35% 65% 


Social Preference 


e’ 40% 60% Scoring penini 
BE a> HABC}: 


75% 25% 


FIGURE 13.1. In pairwise contests A defeats C and C defeats B, yet B defeats A. 


13.1. Voting and ranking mechanisms 
We begin with two examples of voting systems. 


EXAMPLE 13.1.1 (Plurality voting). In plurality voting, each voter chooses 
his or her favorite candidate, and the candidate who receives the most votes wins 
(with some tie-breaking rule). The winner need not obtain a majority of the votes. 
In the U.S., congressional elections are conducted using plurality voting. 

To compare this to other voting methods, it’s useful to consider extended plu- 
rality voting, where each voter submits a rank-ordering of the candidates and the 
candidate with the most first-place votes wins the election (with some tie-breaking 
rule). 

This voting system is attractively simple but, as shown in [Figure 13.2] it has 
the disturbing property that the candidate that is elected can be the least favorite 
for a majority of the population. This occurred in the 1998 election for governor of 
Minnesota when former professional wrestler Jesse Ventura won the election with 
37% of the vote. Exit polls showed that he would have lost (with a wide margin) 
one-on-one contests with each of the other two candidates. 

Comparing [Figure 13.2|and [Figure 13.3] indicates that plurality is strategically 
vulnerable. The voters who prefer C to B to A can change the outcome from A to 
B without changing the relative ordering of A and B in the rankings they submit. 
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45% 30% 25% 
A ` FB | : Ci Social Preference 


FIGURE 13.2. Option A is preferred by 45% of the population, option B by 


30%, and option C by 25%, and thus A wins a plurality vote. However, A is 
the least favorite for 55% of the population. 


45% 30% 25% 


Social Preference 


FIGURE 13.3. When 25% strategically switch their votes from C to B, the 
relative ranking of A and B in the outcome changes. 


EXAMPLE 13.1.2 (Runoff elections). A modification of plurality that avoids 
the Minnesota scenario mentioned above is runoff elections. If in the first round 
no candidate has a majority, then the two leading candidates compete in a second 
round. This system is used in many countries including India, Brazil, and France. 


30% 45% 25% 


epee ee pet es 55% 45% 
A B C C is eliminated 


Winner 
Bid ja. SQ ala 


FIGURE 13.4. In the first round C is eliminated. When votes are redistributed, 
B gets the majority. 


This method is also strategically vulnerable. If voters in the second group from 
Figure 13.4] knew the distribution of preferences, they could ensure a victory for 


B by having some of them conceal their true preference and move C to the top of 
their rankings, as shown in|Figure 13.5 


10% 30% 35% 25% 


mop, omy n 65% 35% Winner 
cy A} LB C| Ais eliminated ee ees 

i Poo NRK PBL EC! a tent 
BOB ic iA, RAD Re 


FIGURE 13.5. Some of the voters from the second group in|Figure 13.4] mis- 
represent their true preferences, ensuring that A is eliminated. As a result, B 
wins the election. 
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13.2. Definitions 


We consider settings in which there is a set of candidates I, a set of n voters, and 
a rule that describes how the voters’ preferences are used to determine an outcome. 
We consider two different kinds of rules. A voting rule produces a single winner, 
and a ranking rule produces a social ranking over the candidates. Voting rules 
are obviously used for elections, or, more generally, when a group needs to select one 
of several alternatives. A ranking rule might be used when a university department 
is ranking faculty candidates based on the preferences of current faculty members. 

In both cases, we assume that the ranking of each voter is represented by a 
preference relation > on the set of candidates [ which is complete (VA, B, either 
A > Bor B > A) and transitive (A > B and B > C implies A > C). This 
definition does not allow for ties; we discuss rankings with ties in the notes. 

We use >; to denote voter i’s preference relation: A >; B if voter i strictly 
prefers candidate A to candidate B. A preference profile (~1,...,>,) describes 
the preference relations of all n voters. 


DEFINITION 13.2.1. A voting rule f maps each preference profile 7 = 
(>1,-.-,*n) to an element of T, the winner of the election. 


DEFINITION 13.2.2. A ranking rule R associates to each preference profile, 


mt = (>1,---,>n), a social ranking, another complete and transitive preference 
relation > = R(7w). (A > B means that A is strictly preferred to B in the social 
ranking.) 


REMARK 13.2.3. An obvious way to obtain a voting rule from a ranking rule 
is to output the top ranked candidate. (For another way, see [Exercise 13.3}) Con- 
versely, a voting rule yields an induced ranking rule as follows. Apply the voting 
rule to select the top candidate. Then apply the voting rule to the remaining can- 
didates to select the next candidate and so on. However, not all ranking rules can 


be obtained this way; see|Exercise 13.2 


An obvious property that we would like a ranking rule R to have is unanimity: 
If for every voter i we have A >; B, then A > B. In words, if every voter strictly 
prefers candidate A to B, then A should be strictly preferred to B in the social 
ranking. 

Kenneth Arrow introduced another property called Independence of Irrel- 
evant Alternative'{(ITA): For any two candidates A and B, the preference be- 
tween A and B in the social ranking depends only on the voters’ preferences between 
A and B. Formally, if m = {>;} and a’ = {>‘} are two profiles for which each 
voter has the same preference between A and B, i.e., {i | A>; B} = {i | A >! By, 
then A > B implies A >’ B. 

IIA seems appealing at first glance, but as we shall see later, it is problematic. 
Indeed, almost all ranking rules violate IIA. The next lemma shows that if ITA fails, 
then there exist profiles in which some voter is better off submitting a ranking that 
differs from his ideal ranking. 


DEFINITION 13.2.4. A ranking rule R is strategically vulnerable at the 
profile m = (>1,...,>n) if there is a voter i and alternatives A and B so that 


1 This is similar to the notion by the same name from|§12.4 
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A>; Band B > Ain R(r), yet replacing >; by >} yields a profile m* such that 
A >* Bin R(n*). 

LEMMA 13.2.5. If a ranking rule R violates IIA, then it is strategically vulner- 
able. 


PROOF. Let m = {>;} and m’ = {>‘} be two profiles that are identical with 
respect to preferences between candidates A and B but differ on the social ranking 
of A relative to B. That is, {j | A>; B} ={j | A >} B}, but A > B in R(7), 
whereas B >’ A in R(x’). Let o; = (>4,..-,>5,-i41,---;%n), so that oo = m 
and On = 7’. Let i € [1,n] be such that A > B in R(oj_1), but B > A in R(o;). 
If B >; A, then R is strategically vulnerable at o;_1 since voter i can switch from 
=; to >. Similarly, if A >; B, then R is vunerable at ø; since voter i can switch 
from >; to >i. 


For plurality voting, as we saw in the example of Figures and [13.3] the 
induced ranking rule violates IIA. Similarly, shows that the ranking 
rule induced by runoff elections also violates IIA, since it allows for the relative 
ranking of A and B to be switched without changing any of the individual A-B 
preferences. 

There is one ranking rule that obviously does satisfy IIA: 


EXAMPLE 13.2.6 ( Dictatorship). A ranking rule is a dictatorship if there is a 
voter v whose preferences are reproduced in the outcome. In other words, for every 
pair of candidates A and B, we have A >, B if and only if A > B. 


13.3. Arrow’s Impossibility Theorem 


THEOREM 13.3.1. Any ranking rule that satisfies unanimity and independence 
of irrelevant alternatives is a dictatorship. 


What does the theorem mean? If we want to avoid dictatorship, then it need 
not be optimal for voters to submit their ideal ranking; the same applies to voting 


by |Theorem 13.4.2} Thus, strategizing is an inevitable part of ranking and voting. 
See |§13.7| for a proof of Arrow’s theorem. 

REMARK 13.3.2. Not only is IIA impossible to achieve under any reasonable 
voting scheme; it is doubtful if it is desirable because it ignores key information in 
the rankings, namely the strengths of preferences. See|Figure 13.6 


50% 50% 


FIGURE 13.6. Given the profile 7, it seems that society should rank A above 
B since, for the second group, A is their top-ranked candidate. In profile zr’, 
the situation is reversed, yet IIA dictates that the relative social ranking of A 
and B is the same in both profiles. 
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13.4. The Gibbard-Satterthwaite Theorem 


Arrow’s Impossibility Theorem applies to the setting where a social ranking is 
produced. A similar phenomenon arises even when the goal is to select a single 
candidate. Consider n voters in a society, each with a complete ranking of a set of 
m candidates I and a voting rule f mapping each profile m = (>1,...,>n) of n 
rankings of T to a candidate f(m) € F. A voting rule for which no voter can benefit 
by misreporting his ranking is called strategy-proof. 


DEFINITION 13.4.1. A voting rule f from profiles to T is strategy-proof if for 
all profiles m, candidates A and B, and voters i, the following holds: If A >; B and 
f(a) = B, then all x’ that differ from 7 only in voter it’s ranking satisfy f(a’) # A. 


THEOREM 13.4.2. Let f be a strategy-proof voting rule onto T, where |T| > 3. 
Then f is a dictatorship. That is, there is a voter i such that for every profile 7 
voter i’s highest ranked candidate is equal to f(r). 


The proof of the theorem is in|§13.8 


13.5. Desirable properties for voting and ranking 


Arrow’s theorem is often misconstrued to imply that all voting systems are 
flawed and hence it doesn’t matter which voting system is used. In fact, there are 
many dimensions on which to evaluate voting systems and some systems are better 
than others. 

The following are desirable properties of voting systems beyond unanimity and 
IIA: 


(1) Anonymity (i.e., symmetry): The identities of the voters should not 
affect the results. I.e., if the preference orderings of voters are permuted, 
the society ranking should not change. This is satisfied by most reasonable 
voting systems, but not by the US electoral college or other regional based 
systems. Indeed, switching profiles between very few voters in California 
and Florida would have changed the results of the 2000 election between 
Bush and Gore. 

(2) Monotonicity: If a voter moves candidate A higher in his ranking with- 
out changing the order of other candidates, this should not move A down 
in the society ranking. 

(3) Condorcet winner criterion: If a candidate beats all other candidates 
in pairwise contests, then he should be the winner of the election. A 
related, and seemingly| weaker, property is the Condorcet loser crite- 
rion: The system should never select a candidate that loses to all others 
in pairwise contests. 

(4) IIA with preference strengths: If two profiles have the same prefer- 
ence strengths for A versus B in all voter rankings, then they should yield 
the same preference order between A and B in the social ranking. (The 
preference strength of A versus B in a ranking is the number of places 
where A is ranked above B, which can be negative.) 

(5) Cancellation of ranking cycles: If there is a subset of N candidates, 
and N voters whose rankings are the N cyclic shifts of one another (e.g. 


2 See|Exercise 13.7 
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three voters each with a different ranking from|Figure 13.1), then removing 
these N voters shouldn’t change the outcome. 


13.6. Analysis of specific voting rules 


Next we examine the extent to which the properties just described are satisfied 
by specific voting and ranking rules. 


In instant runoff voting (IRV), also called plurality with elimination, each 
voter submits a ranking, and the winner in an election with N candidates is deter- 
mined as follows. If m = 2, majority vote is used. If m > 2, the candidate with 
the fewest first-place votes is eliminated and removed from all the rankings. An 
instant runoff election is then run on the remaining m — 1 candidates. When there 
are three candidates, this method is equivalent to runoff elections. See[Figure 13.7] 


ites, teed, rae 55% 45% Winner 


Bi ici jai Gop, 


FIGURE 13.7. In the first round C is eliminated. When votes are redistributed, 
B gets the majority. 


IRV satisfies anonymity but fails monotonicity, as shown in|Figure 13.8 


ee ae 65% 35% Winner 
C A} iB C: A is eliminated 0 


: i 4 ! T 'B 
B B cia QD. 

Ioi H a ' 

i i | iC} 
Ai iC: iA B: Aness” 
FIGURE 13.8. When some of the voters from the second group in|Figure 13.7 
switch their preferences by moving B below C, B switches from being a loser 
to a winner. 


IRV does not satisfy the Condorcet winner criterion, as shown in |Figure 13.9 
but satisfies the Condorcet loser criterion since the Cordorcet loser would lose the 
runoff if he gets there. IRV fails IIA with preference strengths and cancellation of 


ranking cycles. See|Exercise 13.4 


40% 40% 20% 


ee ae i 60% 40% Winner 
A! tC: iB B is eliminated ae eh 
i = PA EE eke oh 
Be B iA RE i os 
me g A ~ Hr 
G Atlon AA an 


FIGURE 13.9. B is a Condorcet winner but loses the election. 


The Burlington, Vermont, mayoral election of 2009 used IRV. The IRV winner 
(Bob Kiss) was neither the same as the plurality winner (Kurt Wright) nor the 
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Condorcet winner (Andy Montroll, who was also the Borda count winner; see defi- 
nition below). As a consequence, the IRV method was repealed in Burlington by a 
vote of 52% to 48% in 2010. 


Borda count is a ranking rule in which each voter’s ranking is used to assign 
points to the candidates. If there are m candidates, then m points are assigned to 
each voter’s top-ranked candidate, m — 1 points to the second-ranked candidate, 
and so on. The candidates are then ranked in decreasing order of their point totals 
(with ties broken arbitrarily). Borda count is equivalent to giving each candidate 
the sum of the votes he would get in pairwise contests with all other candidates. 

The Borda count satisfies anonymity, IIA with preference strengths, monotonic- 
ity, the Condorcet loser criterion, and cancellation of ranking cycles. It does not 
satisfy the Condorcet winner criterion, e.g., if 60% of the population has prefer- 
ences A> B > C and the remaining 40% have preferences B > C > A. This example 
illustrates a weakness of the Condorcet winner criterion: It ignores the strength of 
preferences. 

By Arrow’s Theorem, the Borda count violates IIA and is strategically vulner- 
able. In the example shown in Figure A has an unambiguous majority of 
votes and is also the winner. 


51% 45% 4% Social Preference 


In an election with 100 voters 
the Borda scores are: 


A: B: C: 


206 190 204 


co 
NS 


FIGURE 13.10. Alternative A has the overall majority and is the winner under 
Borda count. 


However, if supporters of C (the third group) were to strategically rank B 
above A, they could ensure a victory for C. This is also a violation of IIA since 
none of the individual A-C preferences had been changed. 


51% 45% 4% Social Preference 
pF pees 4 ghee In an election with 100 voters 

(AS BS i C:3 | the Borda scores are: Wak 

es ar A: BG ah 4 

[C2] ee B2 oop 194 24 oY A 


FIGURE 13.11. Supporters of C can ensure his win by moving B up in their rankings. 


A positional voting method is determined by a fixed vector a = a, > az > 

- > ay as follows: Each voter assigns a; points to his top candidate, az to 
the second, etc; the social ranking is determined by the point totals. Plurality and 
Borda count are positional voting methods. Every positional voting method satisfies 


anonymity, monotonicity, and cancellation of ranking cycles. See|Figure 13.12] for 
a relevant example. No positional method satisfies the Condorcet winner criterion 
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(see |Figure 13.12), and the only one that satisfies IIA with preference strengths is 
the Borda count. 


30 1 29 10 10 1 


10 | 10 | 10 1/1j)1 20 | as. 
alale alele ala 
B;|C|A CIBJA BIA 
C|A|B BIÍIAIC CIC 


FIGURE 13.12. The table on top shows the rankings for a set of voters. (The 
top line gives the number of voters with each ranking.) One can readily 
verify that A wins all pairwise contests. However, for this voter profile, the 
Borda count winner is B. The three tables below provide a different rationale 
for B winning the election by dividing the voters into groups. The left and 
middle groups are ranking cycles. After cancellation of ranking cycles, the 
only voters remaining are in the bottom rightmost table. In this group, B is 
the clear winner. It follows that, in the original ranking, B is the winner for 
any positional method. 


Approval voting is a voting scheme used by various professional societies, 
such as the American Mathematical Society and the American Statistical Associa- 
tion. In this procedure, the voters can approve as many candidates as they wish. 
Candidates are then ranked in the order of the number of approvals they received. 


i Vote for all 5 
: acceptable candidates. = 


: ùl Candidate A 


Candidate B 


A ù Candidate C 


FIGURE 13.13. Candidates A and C will each receive one vote. 
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Given the ranking of the candidates by a voter, we may assume the voter 
selects some k and approves his top k choices. Approval voting satisfies anonymity, 
monotonicity, and a version of IIA: If voter i approved candidate A and disapproved 
of B, yet B was ranked above A by society, then there is no modification of voter i’s 
input that can reverse this. There is no contradiction to Arrow’s theorem because 
voters are not submitting a ranking. However, given a voter’s preference ranking, 
he must choose how many candidates to approve, and the optimal choice depends 
on other voters’ approvals. So strategizing is important here as well. 

In close elections, approval voting often reduces to plurality, as each voter only 
approves his top choice. 


13.7. Proof of Arrow’s Impossibility Theorem* 


Recall|Theorem 13.3.1)/which says that any ranking rule that satisfies unanimity 


and independence of irrelevant alternatives is a dictatorship. 

Fix a ranking rule R that satisfies unanimity and IIA. The proof we present 
requires that we consider extremal candidates, those that are either most preferred 
or least preferred by each voter. The proof is written so that it applies verbatim 
to rankings with ties, as discussed in the notes; therefore, we occasionally refer to 
“strict” preferences. 


LEMMA 13.7.1 (Extremal Lemma). Consider an arbitrary candidate B. For 
any profile m in which B has an extremal rank for each voter (i.e., B is strictly 
preferred to all other candidates or all other candidates are strictly preferred to B), 
B has an extremal rank in the social ranking R(T). 


PROOF. Suppose not. Then for such a profile m, with > = R(m), there are 
two candidates A and C such that A > B and B >œ C. Consider a new profile 
nm’ = (>{,...,>%,) obtained from m by having every voter move C just above A in 
his ranking (if C is not already above A). See [Figure 13.14} None of the AB or 
BC preferences change since B started out and stays in the same extremal rank. 
Hence, by IIA, in the outcome >’ = R(7’), we have A >’ B and B bp’ C, and 
hence A >’ C. But this violates unanimity since for all voters i in a’, we have 
C >=; A. 


DEFINITION 13.7.2. Let B be a candidate. Voter 7 is said to be B-pivotal for 
a ranking rule R(-) if there exist profiles mı and mz such that 
e B is extremal for all voters in both profiles; 
e the only difference between mı and mə is that B is strictly lowest ranked 
by i in mı and B is strictly highest ranked by i in 79; 
e B is ranked strictly lowest in R(7) and strictly highest in R(72). 


Such a voter has the “power” to move candidate B from the very bottom of 
the social ranking to the very top. 


LEMMA 13.7.3. For every candidate B, there is a B-pivotal voter v(B). 


PROOF. Consider an arbitrary profile in which candidate B is ranked strictly 
lowest by every voter. By unanimity, all other candidates are strictly preferred to 
B in the social ranking. Now consider a sequence of profiles obtained by letting 
the voters, one at a time, move B from the bottom to the top of their rankings. 
By the extremal lemma, for each one of these profiles, B is either at the top or at 
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FIGURE 13.14. Illustration of the proof of|Lemma 13.7.1} The bottom figure 


shows what happens when every voter whose preference in m has C below A 
moves Č just above A. 


the bottom of the social ranking. Also, by unanimity, as soon as all the voters put 
B at the top of their rankings, so must the social ranking. Hence, there is a first 
voter v whose change in preference precipitates the change in the social ranking of 
candidate B. This change is illustrated in Figure[13.15] where 7 is the profile just 
before v has switched B to the top with >; = R(7) and mə the profile immediately 
after the switch with >2 = R(72). This voter v is B-pivotal. 


T, Rim) = œ T, 
B B g 7 B B 
sis Ci e D : l ans 
È D Cc eo 
is B; A = a 
1 ae o en 1 200" g, %7 n Social Ranking 


FIGURE 13.15. When, in mı, voter v moves B to the top of his ranking, 
resulting in m2, B moves to the top of the social ranking. 


LEMMA 13.7.4. If voter v is B-pivotal, then v is a dictator on T \ {B}; i.e., for 
any profile n, if A, B, and C are distinct and A >, C in m, then AD C in R(x). 
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PROOF. Since v is B-pivotal, there are profiles mı and m2 such that B is ex- 
tremal for all voters, B is ranked lowest by v in m, and in R(7), and B is ranked 
highest by v in 72 and in R(7o). 


arbitrary < 


FIGURE 13.16. In 73, voter B is in the same extremal position as m2, except 
that A is just above B for voter v. Otherwise, the preferences in m3 are 


arbitrary. 


Suppose that A, B, and C are distinct and let m be a profile in which A >, C. 

Construct profile m3 from 7 as follows: 

e For each voter i Æ v, move B to the extremal position he has in 72. 

e Let v rank A first and B second. 
Let >3 = R(m3). Then the preferences between A and B in 73 are the same as in 
mı and thus, by IIA, we have A >3 B. Also, the preferences between B and C in 
73 are the same as in mT and thus, by IIA, we have B >3 C. Hence, by transitivity, 
we have A œs C. See|Figure 13.16| Since the A-C preferences of all voters in m 


are the same as in m3, we must have A > C. 


PROOF or [THEOREM 13.3.1] By Lemmas and there is a B- 
pivotal voter v = v(B) that is a dictator on I \ {B}. Let mı and m2 be the profiles 
from the definition of v being B-pivotal. We claim that for any other candidate, 
say C, the C-pivotal voter v’ = v(C) is actually the same voter; i.e., v = v’. 

To see this, consider A distinct from both B and C. We know that in > 1, we 
have A > B, and in >2, we have B > A. Moreover, by Lemma [13.7.4] v’ dictates 
the strict preference between A and B in both of these outcomes. But in both 
profiles, the strict preference between A and B is the same for all voters other than 
v. Hence v’ = v, and thus v is a dictator (over all of T). 


13.8. Proof of the Gibbard-Satterthwaite Theorem* 


Recall [Theorem 13.4.2|which says that if f is a strategy-proof voting rule onto 


T, where |I| > 3, then f is a dictatorship. That is, there is a voter ¿i such that 
for every profile m, voter i’s highest ranked candidate is equal to f(m). 

We deduce this theorem from Arrow’s theorem, by showing that if f is strategy- 
proof and is not a dictatorship, then it can be extended to a ranking rule that 
satisfies unanimity, IIA, and that is not a dictatorship, a contradiction. 

The following notation will also be useful. 
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DEFINITION 13.8.1. Let m = (>1,...,>n) and m’ = (>4,...,>/,) be two pro- 
files and let r;(7, 2’) denote the profile (~{,...,>{, -ig1,---,>n). Thus ro(a, m’) = 
T and rnnt, T) = 7’. 


We will repeatedly use the following lemma. 


LEMMA 13.8.2. Suppose that f is strategy-proof. Consider two profiles 7 = 
(=1,...,>n) and T’ = (>4,...,>1,) and two candidates X and Y such that 
e all preferences between X and Y in n and n’ are the same 
(ie, X >i Y iff X =; Y for alli); 
e in n' all voters prefer X to all candidates other than Y 
(i.e, X >! Z for all Z € {X,Y }); 
e f(n)=X. 
Then f(m') = X. 


T 
penne ee Ne 
o bee A 
R ae 
iG x 
xX oo Y Yo f(n) 
. oop oes i Y a> x 
yo x 
Yi X 
x xi y x yY 
x x 
7 a f(r’) 
gate 7 => X 
` 
e koe 
— 
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FIGURE 13.17. An illustration of the statement of [Lemma 13.8.2 


PROOF. Let r; := ri(m, n’). We have f(ro) = X by assumption. We prove by 
induction on i that f(r;) = X or else f is not strategy-proof. To this end, suppose 
that f(ri-1) = X. Observe that r;—ı and r; differ only in voter i’s ranking: In ri—1 
it is >; and in r; it is >/. 

There are two cases to rule out: If f(r;) = Z ¢ {X,Y}, then in profile r;, voter 
i has an incentive to lie and report >; instead of >/. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


228 13. SOCIAL CHOICE AND VOTING 


On the other hand, suppose f(r;) = Y. If X >; Y, then in profile r;, voter i 
has an incentive to lie and report >; instead of +4. On the other hand, if Y >; X, 
then in profile r;—1, voter 7 has an incentive to lie and report >/ instead of >;. 


We also need the following definition. 


DEFINITION 13.8.3. Let S be a subset of the candidates T, and let m be a 
ranking of the candidates I. Define a new ranking 7° by moving all candidates in 
S to the top of the ranking, maintaining the same relative ranking between them, 
as well as the same relative ranking between all candidates not in S. 


CLAIM 13.8.4. Let f be strategy-proof and onto T. Then for any profile m and 
every subset S of the candidates T, it must be that f(n) € S. 


Proor. Take any A € S. Since f is onto, there is a profile 7 such that 
f(a) = A. Consider the sequence of profiles r; = rj(7#,7°), with 0 <i < n. We 
claim that f(rj-1) € S implies that f(r;) € S. Otherwise, on profile r;, voter i has 
an incentive to lie and report +; instead of +9. Thus, since f(ro) = f(a) € S, we 
conclude that f(r,) = f(7*°) € S as well. 


PROOF OF [THEOREM 13.4.2] Let f be strategy-proof, onto, and a nondicta- 
torship. Define a ranking rule R(7) as follows. For each pair of candidates A and 
B, let A > B if f(w{43}) = A and B > A if f(rt4B}) = B. (Claim [13.8.4] 
guarantees that these are the only two possibilities. ) 

To see that this is a bona fide ranking rule, we observe that these pairwise 
rankings are transitive. If not, there is a triple of candidates such that A > B, 
B > C, and C >œ A. Let S = {A,B,C}. We know that f(a°) € S; without loss 
of generality f(a°) = A. Applying Lemma 13.8.2} with m = 7° and m’ = nt& 0}, 
X = A and Y = C, we conclude that f(t) = A and A > C, a contradiction. 


Next, we verify that the ranking rule R satisfies unanimity, IIA, and is not a 
dictatorship. 

Unanimity follows from the fact that if in m all voters have A >; B, then 
(m14B})A = mt4B}, and thus by Claim[13.8.4] f(rt45}) = A. 

To see that IIA holds, let mı and £ be two profiles that agree on all of their 


AB preferences. Then by Lemma |13.8.2| with m = wit BI and m’ = nit BI and 


Claim [13.8.4] we conclude that f(m!” i j= f(ni*}), and hence IIA holds. 

Finally, the ranking rule R is not a dictatorship because f is not a dictatorship: 
For every voter v, there is a candidate A and a profile m in which A is v’s highest 
ranked candidate, but f(a) = B + A. Then, applying Lemma[13.8.2] to the pair of 
profiles m and {4:8}, with X = B and Y = A, we conclude that f(a!4:8!) = B, 
and thus B > A in the outcome of the election. Hence voter v is not a dictator 
relative to the ranking rule R. 


Notes 
The history of social choice and voting is fascinating [ASS02| [ASS11 Prol 1}; 


we mention only a few highlights here: Chevalier de Borda proposed the Borda count in 
1770 when he discovered that the Plurality method then used by the French Academy of 
Sciences was vulnerable to strategic manipulation. The Borda count was then used by the 
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Academy for the next two decades. The method of pairwise contests referred to in the 
beginning of this chapter was proposed by the Marquis de Condorcet after he demonstrated 
that the Borda count was also vulnerable to strategic manipulation. Apparently, Borda’s 
response was that his scheme is “intended only for honest men.” Condorcet proceeded to 
show a vulnerability in his own method—a tie in the presence of a preference cycle [C85]. 

Besides the properties of voting systems discussed in the text, other important prop- 
erties include: 


e Consistency: If separate elections are run with two groups of voters and yield 
the same social ranking, then combining the groups should yield the same rank- 
ing. See Exercise 13.5 

e Participation: The addition of a voter who strictly prefers candidate A to 
B should not change the winner from candidate A to candidate B. See [Exer-] 

Moulin showed that every method that elects the Condorcet winner, when 
there is one, must violate the participation property, assuming four or more 
candidates [Mou88b]. 

e Reversal symmetry: If all the rankings are reversed, then the social ranking 
should also be reversed. 

e Invariance to candidates dropping out: If a candidate who loses an election 
on a given profile drops out, the winner of the election should not change. 

This property sounds reasonable, but none of the voting methods we discuss 
satisfy it. (It is satisfied by approval voting only if the candidate dropping out 
does not affect the other approvals.) Moreover, like IIA, it is not clear that this 
property is desirable: When a candidate drops out, some information is lost on 
the strength of preferences between other candidates. 


The example in|Figure 13.12]is from |dC85]; the analysis is due to [Saa95]. 


Since there need not be a Condorcet winner, various methods known as Condorcet 
completions have been proposed. For instance, a voting system known as Black’s method 
elects the Condorcet winner if there is one and otherwise applies Borda count. A method 
known as Copeland’s rule elects the candidate who wins the most pairwise comparisons. 
Charles Dodgson (also known as Lewis Carroll) proposed the following Condorcet comple- 
tion. For each candidate, count how many adjacent swaps in voters’ rankings are needed 
to make him a Condorcet winner. The Dodgson winner is the candidate that minimizes 
this count. However, it is NP-hard to determine the Dodgson winner [BTT89]. 


Kenneth Arrow Donald Saari 


Among known voting methods, Borda count and approval voting stand out for satis- 
fying key desirable properties. Borda count has the advantage of allowing voters to submit 
a full ranking. Donald Saari |Saa95| showed that when the number of strategic voters is 
small, the Borda count is the least susceptible to strategic manipulation among positional 
methods; i.e., the number of profiles susceptible to manipulation is smallest. 
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Approval voting does not allow voters to submit a full ranking. At the other extreme, 
in some voting methods, voters provide information beyond a ranking. For example, 
in cumulative voting, each voter is given £ votes, which he can distribute among the 
candidates at will. Candidates are then ranked by vote totals. 

Cumulative voting satisfies anonymity and monotonicity. It also satisfies IIA with 
preference strengths, if preference strengths are interpreted as the difference in the number 
of votes. 

Finally, cumulative voting enables minority representation: Given k, a coordi- 
nated minority comprising a proportion p of the voters is able to determine the selection 
of |kp| among the top k in the society ranking. See [Exercise 13.8] 

In 1949, Kenneth Arrow proved his famous Impossibility Theorem [Arr51]. The proof 
presented here is adapted from Geanakoplos [Gea04]. The Gibbard-Satterthwaite Theorem 
is from and [Sat75]. 

In 1972, Arrow was awarded the Nobel Prize in Economics (jointly with John Hicks), 
for “their pioneering contributions to general economic equilibrium theory and welfare 
theory.” 

For more on voting and social choice, see 
[AH94]. Recent work in the computer science community has focused on the use of ap- 
proximation to bypass some impossibility results and on connections of social choice with 
computational complexity, noise sensitivity, and sharp thresholds. See, e.g., 
IMBC? iG). 

For more on the 2009 Burlington, Vermont, mayoral election, see [GHS09}. 


Arrow’s Impossibility Theorem 


We have presented here a simplified proof of Arrow’s theorem that is due to Geanako- 
plos [Gea04]. The version in the text assumes that each voter has a complete ranking of 
all the candidates. However, in many cases voters are indifferent between certain subsets 
of candidates. To accommodate this possibility, one can generalize the setting as follows. 

Assume that the preferences of each voter are described by a relation > on the set of 
candidates [ which is reflexive (VA, A > A), complete (VA,B, A > Bor B > Aor 
both), and transitive (A > B and B > C implies A > C). 

As in the chapter, we use >; to denote the preference relation of voter i: A >; B 
if voter i weakly prefers candidate A to candidate B. However, we can now distinguish 
between strict preferences and indifference. As before, we use the notation A >; B to 
denote a strict preference; i.e., A >; B but B ž; A. (If A >; B and B >, A, then voter 
i is indifferent between the two candidates.) 

A reflexive, complete, and transitive relation > can be described in two other equiv- 
alent ways: 


e It is a set of equivalence classes (each equivalence class is a set of candidates that 
the voter is indifferent between), with a total order on the equivalence classes. 
In other words, it is a ranking that allows for ties. 

e It is the ranking induced by a function g : T > R from the candidates to the 
reals, such that A > B if and only if g(A) > g(B). Obviously, many functions 
induce the same preference relation. 


A ranking rule R associates to each preference profile, m = (> ,...,%,,), another 
reflexive, complete, and transitive preference > = R(z). 

In this more general setting, the definitions of unanimity and IIA are essentially 
unchanged. (Formally, IIA states that if m = {>;} and m’ = {>;} are two profiles such 
that {i | A >; B} = {i | A >; B} and {i | B >; A} = {i | B =) A}, then A È B implies 
Ab’ B.) 

Arrow’s Impossibility Theorem in this setting is virtually identical to the version given 
in the text: Any ranking rule that satisfies unanimity and IIA is a dictatorship. The only 
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difference is that, in the presence of ties, voters other than the dictator can influence the 
outcome with respect to candidates that the dictator is indifferent between. Formally, 
in this more general setting, a dictator is a voter v all of whose strict preferences are 
reproduced in the outcome. 

It is straightforward to check that the proof presented in Section [13.3] goes through 
unchanged. 


Exercises 


13.1. Give an example where one of the losing candidates in a runoff election 
would have a greater support than the winner in a one-on-one contest. 


13.2. Describe a ranking rule that is not the induced ranking rule of any voting 
rule. 


13.3. Another way to go from a ranking rule to a voting rule is the following: 
Use the ranking rule. Eliminate the lowest ranked candidate. Repeat until 
one candidate remains. Apply this procedure and the one in the text to 
vote counting. What voting rule do you get in the two cases? 


S 13.4. Show that Instant Runoff violates the Condorcet winner criterion, IIA with 
preference strengths and cancellation of ranking cycles. 


13.5. The consistency property of a voting rule says: If separate elections are run 
with two groups of voters and yield the same social ranking, then combin- 
ing the groups should yield the same ranking. Show that plurality, Borda 
count, and approval voting satisfy consistency but IRV does not. 


13.6. The participation criterion requires that the addition of a voter who strictly 
prefers candidate A to B should not change the winner from candidate A 
to candidate B. Show that plurality, approval voting, and Borda count 
satisfy the participation criterion, whereas instant runoff voting doesn’t. 


13.7. Consider the following voting method: If there is a Condorcet winner, he is 
selected. Otherwise, plurality is used. Clearly this satisfies the Condorcet 
winner condition. Give an example with four candidates where it doesn’t 
satisfy the Condorcet loser criterion. 


13.8. Show that cumulative voting enables minority representation: Given k, a 
coordinated minority comprising a proportion p of the voters can determine 
the selection of |kp| among the top k in the society ranking. 


13.9. | Determine which of the voting methods in the text satisfy reversal symme- 
try; that is, if all the rankings are reversed, then the social ranking should 
also be reversed. 
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13.10. Show that the assumption in [Theorem 13.4.2] that f is onto T, where 


IT| > 3, can be weakened to the assumption that the image of f has size 
at least 3. Show that 3 cannot be replaced by 2. 
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CHAPTER 14 


Auctions 


Auctions are an ancient mechanism for buying and selling goods. Indeed, the 
entire Roman Empire was sold by auction in 193 A.D. The winner was Didius 
Julianus. He became the next emperor but was in power only two months before 
he was overthrown and executed by Septimius Severus. 

In modern times, many economic transactions are conducted through auctions: 
The US government runs auctions to sell treasury bills, spectrum licenses, and 
timber and oil leases, among other things. Christie’s and Sotheby’s run auctions 
to sell art. In the age of the Internet, we can buy and sell goods and services via 
auction, using the services of companies like eBay. The advertisement auctions 
that companies like Google and Microsoft run in order to sell advertisement slots 
on their webpages bring in significant revenue. 

Why might a seller use an auction as opposed to simply fixing a price? Primarily 
because sellers often don’t know how much buyers value their goods and don’t want 
to risk setting prices that are either too low, thereby leaving money on the table, 
or so high that nobody will want to buy the item. An auction is a technique for 
dynamically setting prices. Auctions are particularly important these days because 
of their prevalence in Internet settings where the participants in the auction are 
computer programs or individuals who do not know each other. 


14.1. Single item auctions 


We are all familiar with the famous English or ascending, auction for selling 
a single item: The auctioneer starts by calling out a low price p. As long as there 
are at least two people willing to pay the price p, he increases p by a small amount. 
This continues until there is only one player left willing to pay the current price, at 
which point that player “wins” the auction, i.e., receives the item at that price. 

When multiple rounds of communication are inconvenient, the English auction 
is sometimes replaced by other formats. For example, in a sealed-bid first-price 
auction, the participants submit sealed bids to the auctioneer. The auctioneer 
allocates the item to the highest bidder who pays the amount she bid. 

We'll begin by examining auctions from two perspectives: What are equilibrium 
bidding strategies and what is the resulting revenue of the auctioneer? 

To answer these questions, we need to know what value each bidder places on 
the item and what bidders know about each other. For example, in an art auction, 
the value a bidder places on a painting is likely to depend on other people’s values 
for that painting, whereas in an auction for fish among restaurant owners, each 
bidder’s value is known to him before the auction and is roughly independent of 
other bidder’s valuations. 


233 
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FIGURE 14.1. The Tsukiji Fish Market in Tokyo is one of the largest and 
busiest fish markets in the world. Each day before 5 a.m. a tuna auction is 
conducted there. The right-hand figure depicts the 2012 auction at Sotheby’s 
for “The Scream” by Edvard Munch. The painting sold for 120 million dollars, 
about 40 million dollars more than the preauction estimates. 


14.1.1. Bidder model. For most of this chapter, we will assume that each 
player has a private value v for the item being auctioned. This means that he 
would not want to pay more than v for the item: If he gets the item at a price p, 
his utility] is v — p, and if doesn’t get the item (and pays nothing), his utility is 0. 
Given the rules of the auction and any knowledge he has about other players’ bids, 
he will bid so as to maximize his utility. 

In the ascending auction, it is a dominant strategy for a bidder to increase 
his bid as long as the current price is below his value; i.e., doing this maximizes 
his utility no matter what the other bidders do. But how should a player bid in 
a sealed-bid first-price auction? Clearly, bidding one’s value makes no sense since 
even upon winning, this would result in a utility of 0. So a bidder will want to bid 
lower than his true value. But how much lower? Low bidding has the potential 
to increase a player’s gain, but at the same time it increases the risk of losing the 
auction. In fact, the optimal bid in such an auction depends on how the other 
players are bidding, which, in general, a bidder will not know. 


DEFINITION 14.1.1. A (direct) single-item auction A with n bidders is a 
mapping that assigns to any vector of bids b = (b1,...,0,)) a winner and a set of 
prices. The allocation ruld?] of auction A is denoted by a“[b] = (a1[b], ... , an[b]), 
where a;[b] is 1 if the item is allocated to bidder i and 0 ee The payment 
rule of A is denoted by Y4[b] = (A; [b],..., A, [b]) where A;[b] is the payment 
of bidder i when the bid vector is b. 

A bidding strategy for agent] i is a mapping 6; : [0,00) + [0,00) which 
specifies agent i’s bid G;(v;) for each possible value v; she may have. 


DEFINITION 14.1.2 (Private Values). Suppose that n bidders are competing 
in a (direct) single-item auction. The bidders’ values V1, V2,..., Vn are independent 


1 This is called a quasilinear utility model. 

2 Later we will also consider randomized allocation rules. 

3 When the auction is clear from the context, we will drop the superscripts and write af] 
and Y[-] for the allocation rule and payment rule. 

4 We use the terms “agent”, “bidder”, and “player” interchangeably. 
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and for each 7 the distribution F; of V; is common knowledge|| Each bidder 7 also 


knows the realization v; of his own value V;. Fix a bidding strategy 6; : [0, 00) > 
[0, co) for each agent i. Note that we may restrict 6; to the support of V; 


My value is v4, 
but | will bid b, 


by 
— R wins when he 
Bo( Vo) ay[b}] =P | bids bı; others 
V2 ae bid Ba(V2), ax Buln) 
OR BV) Auction 
8 pılbı] = R's expected payment when he 
. = f Bul Vn) bids bı, others bid 3(V2),...,8n(Vp) 


Bidder 1 utility: uy[bilvi] = viailbı] = pılbı] 


FIGURE 14.2. Illustration of basic definitions from the perspective of bidder 
1. In this figure, bidder 1 knows that each bidder i, for 2 < i < n, has a value 
drawn independently from F and is bidding according to the bidding strategy 
Bı(-). The allocation probability, expected payment, and expected utility of 
bidder 1 are expressed in terms of bidder 1’s bid. 


The allocation probabilities are 


a;[b] := P [bidder i wins bidding b when other bids are 6;(V;), Vj # i] 
= E [a;[b;, B-i(V_a)]]. 


The expected payments are: 


p;|b] := E [payment of bidder i bidding b when other bids are 6;(V;), Vj # i] 
= E |Y;|[b;, B_s(V_.)]]- 


The expected utility of bidder 7 with value v; bidding b is 


u4[b] v4] = Vili [b] = pilb]. 


The bidding strategy profile (81,... , 8n) is in Bayes-Nash equilibrium if for 
all i, 


ui[Gi(vi)|vi] > ui[blv;] for all v; and b. 


In words, for each bidder i, the bidding strategy Bi maximizes i’s expected utility, 
given that for all j Ai, bidder j is bidding B,(V;). 


5 Le., all participants know the bidder distributions and know that each other knows and so 
on. 


6 The support of a random variable V with distribution function F is “the set of values it 
takes”; formally it is defined as supp(V) = supp(F) := Neso{x|F(x + €) — F(x — ©) > 0}. 
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14.2. Independent private values 


Consider a first-price auction in which each player’s value V; is drawn indepen- 
dently from a distribution F;. If each other bidder j bids 6;(V;) and bidder i bids 
b, his expected utility is 


uj[b|u;] = (vi — b) - a;[b] = (v: — b) -P[b > max ĝ;(V;)] (14.1) 


EXAMPLE 14.2.1 (Two-bidder first-price auction with uniformly dis- 
tributed values). Consider a two-bidder first-price auction where the V; are in- 
dependent and uniform on [0,1]. Suppose that 6; = 62 = £ is an equilibrium, with 
b : [0,1] > [0, 6(1)] differentiable and strictly increasing. Bidder 1 with value v1, 
knowing that bidder 2 is bidding 8(V2), compares the utility of alternative bids b 
to (vı). (We may assume that b € [0, 8(1)] since higher bids are dominated by 
bidding 6(1).) With this bid, the expected utility for bidder 1 is 


ui [blur] = (vı — b) - P [b > B(V2)]. (14.2) 

We can write b = p(w) for some w Æ vı and then P [b > 8(V2)] = w. Using the 
notatio 

u1(wlv1) := ui [8 (w)lv1], (14.3) 


equation |(14.2)| becomes 


ui(wlv1) = (vy — B(w)) Ww. (14.4) 
For § to be an equilibrium, the utility ui(w|v1) must be maximized when 
w = v1, and consequently, 


Ou (w|v1) 


Yv = v — B'(w)w — B(w)} =, 


Ow _ w=v 


Thus, for all v1, 
v1 = B'(v1)u1 + B(v1) = (v1b(v1)V. 
Integrating both sides, we obtain 


2 
Soupo) andso Blu) =>. 


(We have dropped the constant term, as we assume that 3(0) = 0.) 
We now verify that 6; = 82 = £ is an equilibrium with 8(v) = v/2. Bidder 1’s 
utility when her value is v1, she bids b, and bidder 2 bids 6(V2) = V2/2 is 


V: 
u,[b|v1] = P $ < | (vı — b) = 2b(v, — b). 
The choice of b that maximizes this utility is b = v,/2. Since the bidders are 


symmetric, (v) = v/2 is indeed an equilibrium. 
Thus, in this Bayes-Nash equilibrium, the auctioneer’s expected revenue is 


[a (#2). ass) 


T It is worth emphasizing this notational convention: We use square brackets to denote 
functions of bids and regular parentheses to denote functions of alternative valuations. 
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In the example above of an equilibrium for the first-price auction, bidders must 
bid below their values, taking the distribution of competitors’ values into account. 
This contrasts with the English auction, where no such strategizing is needed. Is 
strategic bidding (that is, considering competitors’ values and potential bids) a nec- 
essary consequence of the convenience of sealed-bid auctions? No. In 1960, William 
Vickrey discovered that one can combine the low communication cost] of sealed-bid 
auctions with the simplicity of the optimal bidding rule in ascending auctions. We 
can get a hint on how to construct this combination by determining the revenue of 
the auctioneer in the ascending auction when all bidders act rationally: The item 
is sold to the highest bidder when the current price exceeds what other bidders are 
willing to offer; this threshold price is approximately the value of the item to the 
second highest bidder. 


DEFINITION 14.2.2. In a (sealed-bid) second-price auction (also known as 
a Vickrey auction), the highest bidder wins the auction at a price equal to the 
second highest bid. 


THEOREM 14.2.3. The second-price auction is truthful [|] In other words, for 
each bidder i and for any fixed set of bids of all other bidders, bidder i’s utility is 
maximized by bidding her true value vi. 


PROOF. Suppose the maximum of the bids submitted by bidders other than i 
is m. Then bidder 2’s utility in the auction is at most max(v; — m,0), where the 
first term is the utility for winning and the second term is the utility for losing. For 
each possible value v;, this maximum is achieved by bidding truthfully. 


REMARK 14.2.4. We emphasize that the theorem statement is not merely saying 
that truthful bidding is a Nash equilibrium, but rather the much stronger statement 
that bidding truthfully is a dominant strategy; i.e., it maximizes each bidder’s 
utility no matter what the other bids are. 


In|Chapter 15]and|Chapter 16| we show that a variant of this auction applies 


much more broadly. For example, when an auctioneer has k identical items to sell 
and each bidder wants only one, the following auction is also truthful. 


DEFINITION 14.2.5. In a (sealed-bid) k-unit Vickrey auction the top k 
bidders win the auction at a price equal to the (k + 1)** highest bid. 


EXERCISE 14.a. Prove that the k-unit Vickrey auction is truthful. 


14.3. Revenue in single-item auctions 


From the perspective of the bidders in an auction, a second-price auction is 
appealing. They don’t need to perform any complex strategic calculations. The 
appeal is less clear, however, from the perspective of the auctioneer. Wouldn’t the 
auctioneer make more money running a first-price auction? 


EXAMPLE 14.3.1. We return to our earlier example of two bidders, each with a 


value drawn independently from a U[0,1] distribution. From that analysis, we know 


8 Each bidder submits just one bid to the auctioneer. 
9 An alternative term often used is incentive-compatible. 
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that if the auctioneer runs a first-price auction, then in equilibrium his expected 


revenue will be 
ny max vı Vo = 1 
i 2°>2)/) 3 


On the other hand, suppose that in the exact same setting, the auctioneer runs a 
second-price auction. Since we can assume that the bidders will bid truthfully, the 
auctioneer’s revenue will be the expected value of the second-highest bid, which is 
1 
a. 


2 [min(V1, V2)] 


exactly the same as in the 1* price auctions! 

In fact, in both cases, bidder 7 with value v; has probability v; of winning the 
auction, and the conditional expectation of his payment given winning is v;/2: in 
the case of the first-price auction, this is because he bids v;/2 and in the case of the 
second-price auction, this is because the expected bid of the other player is v;/2. 
Thus, overall, in both cases, his expected payment is v?/2. 


Coincidence? No. As we shall see next, the amazing Revenue Equivalence 
Theorem shows that any two auction formats that have the same allocation rule in 
equilibrium yield the same auctioneer revenue! (This applies even to funky auctions 
like the all-pay auction; see below.) 


14.4. Toward revenue equivalence 


To test whether a strategy profile B = (81, 82,...,8n) is an equilibrium, it will 
be important to determine the utility for bidder i when he bids as if his value is w Æ 
vi; we need to show that he does not benefit from this deviation (in expectation). 


We adapt the notatior|!)| of Definition |14.1.2] as follows: 


My value is v, but I 
will bid as if it is w 


R wins when rs) 

aw) = P | bids B(w); others | =a [8(w)] 
BV) Auction 

p(w) = R's expected payment 


Brl Va) 


= pl (w)] 


Bidder 1 utility: u,(w|vi) = vjai(w) — pi(w) 


FIGURE 14.3. Illustration of Definition|14.4.1|from the perspective of bidder 1. 


Here, in contrast to|Figure 14.2} the allocation probability, expected payment, 
and expected utility of bidder 1 are expressed in terms of the value bidder 1 
is “pretending” to have, as opposed to being expressed in terms of his bid. 


10 As in Example 14.2.1| we use square brackets to denote functions of bids and regular 


parentheses to denote functions of alternative valuations. 
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DEFINITION 14.4.1. Let (6;)?_, be a strategy profile for n bidders with indepen- 
dent values Vi, V2,..., Vn. We assume that the distribution of each V;, i = 1,..., n, 
is common knowledge among all the bidders. Suppose bidder i, knowing his own 
value v; and knowing that the other bidders j # i are bidding 6;(V;), bids 8;(w). 
Recalling [Definition 14.1.1] we define the following: 

e The allocation probability to bidder i is a;(w) := a,[6;(w)]. 
e His expected payment{!|] is p;(w) := p;[B;(w)]. 
e His expected utility is u;(w|v;) := ui[S;(w)|vi] = viai(w) — pi(w). 


In this notation, (8;)?_, is in Bayes-Nash equilibrium (BNE) only iff] 
ui(vi|vi) > u;(wlvi) for all i, vi, and w. (14.6) 


14.4.1. I.I.D. bidders. Consider the setting of n bidders, with i.i.d. values 
drawn from a distribution with strictly increasing distribution function F. Since 
the bidders are all symmetric, it’s natural to look for symmetric equilibria; i.e., 
Bi = B for all i. As above (dropping the subscript in a;(-) due to symmetry), let 


a(v) = P [item allocated to i with bid 6(v) when other bidders bid 6(V;)]. 


Consider any auction in which the item goes to the highest bidder (as in a first-price 
or second-price auction). If 6(-) is strictly increasing, then 


a(w) = P[6(w) > max 6(V;)| = P[w > max Vj] = F"=1w). 
j#i j#i 
If bidder i bids 6(w), his expected utility is 
u(wlv;) = vja(w) — p(w). (14.7) 


Assume p(w) and a(w) are differentiable. For 6 to be an equilibrium, it must be 
that for all v;, the derivative v;a’ (w) — p'(w) vanishes at w = v;, so 


p' (vi) = vja'(v;) for all v;. 


Hence, if p(0) = 0, we get 
po) = | va'(e)ae, 
0 


which yields, via integration by parts, that 


p(uj) = vja(v;) — [ a(w)dw. (14.8) 


In other words, the expected payment of a bidder with value v is the same in any 
auction that allocates to the highest bidder. Hence, all such auctions yield the same 
expected revenue to the auctioneer. 


11 We will usually assume that pi(0) = 0, as this holds in most auctions. 
12 Conversely, if|(14.6)| holds and uj;(v;|v;) > us[b|v;] for all i, b ZImage(@;), and vi, then 
(bi)? is a BNE. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


240 14. AUCTIONS 


14.4.2. Payment and revenue equivalence. The following theorem for- 
malizes the discussion in the previous section. 


THEOREM 14.4.2 (Revenue Equivalence). Suppose that each agent’s value 
V; is drawn independently from the same strictly increasing distribution F € [0, h]. 
Consider any n-bidder single-item auction in which the item is allocated to the 
highest bidder, and u;(0) = 0 for alli. Assume that the bidders employ a symmetric 
strategy profile Bi := B for alli, where B is strictly increasing in (0, h]. 


(i) If (6,...,8) is a Bayes-Nash equilibrium, then for a bidder with value v, 
a(v) =F(v)""" and p(v) = va(v) -f a(w)dw. (14.9) 
0 


(ii) If (14.9) holds for the strategy profile (B,...,), then for any bidder i 
with utility u(-|-) and all v,w € (0, Al, 


u(v|v) > u(wlv). (14.10) 


REMARK 14.4.3. Part (ii) of the theorem implies that if holds for a 
symmetric strategy profile, then these bidding strategies are an equilibrium relative 
to alternatives in the image of 8(-). In fact, showing that holds can be an 
efficient way to prove that (8,..., 8) is a Bayes-Nash equilibrium since strategies 
that are outside the range of p can often be ruled out directly. We will see this in 
the examples below. 


PROOF OF [LHEOREM 14.4.2} In the previous section, we proved (i) under the 
assumption that a(-) and p(-) are differentiable; a proof of (i) without these as- 
sumptions is given in|§14.6} For (ii), it follows from|(14.9)| that 


ts(ule) = va(v) — p@) = f "alae, 


whereas 


If v > w, then 


v 


u(v|v) — u(wļv) = J [a(z) — a(w)|dz > 0 


WwW 


since a(v) = F(v)"~+ is an increasing function. The case v < w is similar. Thus, 
for all v,w € [0, h], u(v|v) > u(w|v). 


COROLLARY 14.4.4. Under the assumptions of|Theorem 14.4.2, 


p(o) = Flv)" 'E | max v; 


i<n-1 


max Vj; < e|. (14.11) 
i<n-1 

PROOF. Since the truthful second-price auction allocates to the highest bidder, 
we can use it to calculate p(v): The probability that a bidder wins is a(v) = 
F(v)"~', and if he wins, his payment has the distribution of max ;<,_1 V; given 
that this maximum is at most v. 
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14.4.3. Applications. We now use this corollary to derive equilibrium strate- 
gies in a number of auctions. 


First-price auction: (a) Suppose that 8 is strictly increasing on [0, h] and defines a 
symmetric equilibrium. Then a(v) and p(v) are given by (14.9). Since the expected 
payment p(v) in a first-price auction is F(v)"~18(v), it follows that 


max, Vi < e] = L i eg dw. (14.12) 


The rightmost expression in|(14.12)} follows from the general formula 


BU) =E | max, v 


i<n-1 


(z= f PIZ > wlaw 


for nonnegative random variables, applied to Z = max; V; conditioned on Z < v. 

(b) Suppose that 8 is defined by the preceding equation. We verify that this 
formula actually defines an equilibrium. Since F is strictly increasing, by the pre- 
ceding equation, 8 is also strictly increasing. Therefore a(v) = F(v)"~! and|(14.9) 
holds. Hence holds. Finally, bidding more than (h) is dominated by 
bidding (h). Hence this bidding strategy is in fact an equilibrium. 


Examples: 


With n bidders, each with a value that is U[0,1], we obtain that 8(v) = "=v. 


With 2 bidders, each with a value that is exponential with parameter 1 (that 
is, F(v) = 1 — e™”), we obtain 


This function is always below 1. Thus, even a bidder with a value of 100 will bid 
below 1 in equilibrium! 


All-pay auction: This auction allocates to the player that bids the highest but 
charges every player their bid. For example, architects competing for a construction 
project submit design proposals. While only one architect wins the contest, all 
competitors expend the effort to prepare their proposals. Thus, participants need 
to make the strategic decision as to how much effort to put in. 

In an all-pay auction, 6(v) = p(v), and therefore by Corollary [14.4.4] we find 
that the only symmetric increasing equilibrium is given by 


i<n-1 


B(v) = Fw)" tE | max V; max V; < o|. 


For example, if F is uniform on [0,1], then 8(v) = =v”. 


War of attrition auction: This auction allocates to the highest bidder, charges 
him the second-highest bid, and charges all other bidders their bid. For example, 
animals fighting over territory expend energy. A winner emerges when the fighting 
ends, and each animal has expended energy up to the point at which he dropped 
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out or, in the case of the winner, until he was the last one left. See the third remark 


after |Example 4.2.3]for another example of a war of attrition and|Exercise 14.5] for 


the analysis of this auction. 


FIGURE 14.4. A war of attrition auction. 


EXERCISE 14.b. In the India Premier League (IPL), cricket franchises can ac- 
quire a player by participating in the annual auction. The rules of the auction are 
as follows. An English auction is run until either only one bidder remains or the 
price reaches $m (for example $m could be $750,000). In the latter case, a sealed- 
bid first-price auction is run with the remaining bidders. (Each of these bidders 
knows how many other bidders remain). 

Use the Revenue Equivalence Theorem to determine equilibrium bidding strate- 
gies in an IPL cricket auction for a player with n competing franchises. Assume 
that the value each franchise has for this player is uniform from 0 to 1 million. 


14.5. Auctions with a reserve price 


We have seen that in equilibrium, with players whose values for the item being 
sold are drawn independently from the same distribution, the expected seller rev- 
enue is the same for any auction that always allocates to the highest bidder. How 
should the seller choose which auction to run? As we have discussed, an appealing 
feature of the second-price auction is that it induces truthful bidding. On the other 
hand, the auctioneer’s revenue might be lower than his own value for the item. A 
notorious example was the 1990 New Zealand sale of spectrum licenses in which a 
second-price auction was used; the winning bidder bid $100,000 but paid only $6! 
A natural remedy for situations like this is for the auctioneer to impose a reserve 
price. 


DEFINITION 14.5.1. The Vickrey auction (or second-price auction) with 
a reserve price r is a sealed-bid auction in which the item is not allocated if all 
bids are below r. Otherwise, the item is allocated to the highest bidder, who pays 
the maximum of the second-highest bid and r. 
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A virtually identical argument to that of|Theorem 14.2.3]shows that the Vickrey 
auction with a reserve price is truthful. Alternatively, the truthfulness follows by 


imagining that there is an extra bidder whose value/bid is the reserve price. 

Perhaps surprisingly, an auctioneer may want to impose a reserve price even if 
his own value for the item is zero. For example, we have seen that for two bidders 
with values independent and drawn from U[0, 1], all auctions that allocate to the 
highest bidder have an expected auctioneer revenue of 1/3. 

Now consider the expected revenue if, instead, the auctioneer uses the Vickrey 
auction with a reserve of r. Relative to the case of no reserve price, the auctioneer 
loses an expected revenue of 7/3 if both bidders have values below r, for a total 
expected loss of r3/3. On the other hand, he gains if one bidder is above r and one 
below. This occurs with probability 2r(1— r), and the gain is r minus the expected 
value of the bidder below r; i.e., r — r/2. Altogether, the expected revenue is 

1 r? r 1 2 43 

3 3 7 
Differentiating shows that this is maximized at r = 1/2, yielding an expected 
auctioneer revenue of 5/12. (This is not a violation of the Revenue Equivalence 
Theorem because imposition of a reserve price changes the allocation rule.) 

Remarkably, this simple auction optimizes the auctioneer’s expected revenue 
over all possible auctions. It is a special case of Myerson’s optimal auction, a broadly 
applicable technique for maximizing auctioneer revenue when agents’ values are 


drawn from known prior distributions. We develop the theory of optimal auctions 
in|§14.9 


14.5.1. Revenue equivalence with reserve prices. [Theorem 14.4.2] gen- 


eralizes readily to the case of single-item auctions in which the item is allocated 
to the highest bidder, as long as the bid is above the reserve price r. (See also 
‘Theorem 14.6.1). The only change in the theorem statement is that the allocation 
probability becomes 


a(v) = F(v) App: 
As in|§14.4.3] this enables us to solve for equilibrium bidding strategies in auctions 
with a reserve price. See|Exercise 14.3 


14.5.2. Entry fee versus reserve price. Consider a second-price auction 
with an entry fee of 6. That is, to enter the auction, a bidder must pay 6, and then 
a standard second-price auction (no reserve) is run. Fix a bidder i and suppose 
that all bidders j 4 i employ the following threshold strategy: Bidder j with value 
vj enters the auction if and only if vj > r:=r(6) and then bids truthfully. 

Then it is a best response for bidder į to employ the same strategy if 


rF(r)”1 = ô. 


To see this, let wz(v) be the overall utility to bidder i with value v if he pays the 
entry fee and bids truthfully] If v >r, then ug(v) > —ô + vF(r)""! > 0, since 
the probability that none of the other bidders enter the auction is F(r)”~'. On the 
other hand, if v < r, then ug(v) = —ô + vF(r)""! < 0. 


13 Given that he enters, he is participating in a second-price auction and it is a dominant 
strategy to bid truthfully. 
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Now, let us compare the second-price auction with entry fee of 6 = rF (r)?! 
to a second-price auction with a reserve price of r. Clearly, in both cases, 


a(v) = Foy "gy 


Moreover, the expected payment for a bidder in both cases is the same: If v < r, 
the expected payment is 0 for both. For v > r and the auction with entry fee, 


plv) =d+ (Foy — Fir)” n) | max V; 


i<n-1 


r< max Vi so, 


i<n-1 


whereas with the reserve price, 


p(o) = rF (r) + (FU - Fr)" )E| max, v; 


i<n-1 


r< max V;< o] ; 
i<n-1 

This means that u(w|v) = va(w) — p(w) is the same in both auctions. (This gives 

another proof that the threshold strategy is a Bayes-Nash equilibrium in the entry- 

fee auction.) 

Notice, however, that the payment of a bidder with value v can differ in the 
two auctions. For example, if bidder 1 has value v > r and all other bidders have 
value less than r, then in the entry-fee auction, bidder 1’s payment is 6 = rF(r)"~?, 
whereas in the reserve-price auction it is r. Moreover, if bidder 1 has value v > r, 
but there is another bidder with a higher value, then in the entry-fee auction, bidder 
1 loses the auction but still pays 6, whereas in the reserve-price auction he pays 
nothing. Thus, when the entry-fee auction is over, a bidder may regret having 
participated. This means that this auction is ex-interim individually rational, but 
not ex-post individually rational. See the definitions in§14.5.4] 


14.5.3. Evaluation fee. 


EXAMPLE 14.5.2. The queen is running a second-price auction to sell her crown 
jewels. However, she plans to charge an evaluation fee: A bidder must pay ¢ in order 
to examine the jewels and determine how much he values them prior to bidding in 
the auction. 


Assume that bidder i’s value V; for the jewels is a random variable and that 
the V;’s are i.i.d. Prior to the evaluation, he only knows the distribution of V;. He 
will learn the realization of V; only if he pays the evaluation fee. In this situation, 
as long as the fee is below the bidder’s expected utility in the second-price auction, 
i.e., 


$ < E [u(Vi|V:)], 
the bidder has an incentive to pay the evaluation fee. Thus, the seller can charge 
an evaluation fee equal to the bidder’s expected utility minus some e > 0. The 
expected auctioneer revenue from bidder 2’s evaluation fee and his payment in the 
ensuing second-price auction is 


s [u(Vi|Vi)] — € + E [p(Vi)] = E [ai(Vi) - Vi] — €. 


Since, in a second-price auction, the allocation is to the bidder with the highest 
value, the seller’s expected revenue is 


z [max(Vi, -Vn —€n, 


which is essentially best possible. 
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FIGURE 14.5. The queen has fallen on hard times and must sell her crown jewels. 


14.5.4. Ex-ante versus ex-interim versus ex-post. An auction, with an 
associated equilibrium £, is called individually rational (IR) if each bidder’s ex- 
pected utility is nonnegative. The examples given above illustrate three different 
notions of individual rationality: ex-ante, when bidders know only the value distri- 
butions; ex-interim, when each bidder also knows his own value; and ex-post, after 
the auction concludes. 

To formalize this, recall that u(w,|v;) denotes the expectation (over V_;) of 
the utility of bidder i when he bids 6;(w;), his value is v;, and each other bidder j 
bids 6;(V;). 

e An auction, with an associated equilibrium 8, is ex-ante individually 
rational if, knowing only the distribution of his value and of the other 
bidder’s values, each bidder it’s expected utility is nonnegative. I.e., 


E [u(VilVs)] > 0. (14.13) 


The outside expectation here is over V;. The evaluation fee auction of 


§14.5.3] is ex-ante IR. 
e An auction, with an associated equilibrium, is ex-interim individually 
rational if for each bidder i, 
u(viļvi) > 0. 


The entry-fee auction of|§14.5.2]is ex-interim IR. 
e An auction, with an associated equilibrium 8, is ex-post individually 
rational if for each bidder i, 
u[Bi(v;), b_s|vs] = 0, 


where u[b;, b_;|v;] is the utility of player 7 when his value is v;, he bids b; 
and the other players bid b_;. The standard first-price and second-price 
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auctions are ex-post IR since in these auctions a bidder never pays more 
than his bid, and in the equilibrium bidding strategy, a bidder never bids 
above his value. 


14.6. Characterization of Bayes-Nash equilibrium 


In [14.4.2] we saw that, with iid. bidders, all auctions that allocate to the 
highest bidder result in the same auctioneer revenue in equilibrium. Revenue equiv- 
alence holds for other allocation rules, e.g., with reserve prices and randomizatior|'4] 
if bidders are assumed to be i.i.d. 


The next theorem generalizes [Theorem 14.4.2} and will allow us to design 
revenue-maximizing auctions in |§14.9 


THEOREM 14.6.1. Let A be a (possibly randomized) auction for selling a single 
item, where bidder i’s value V; is drawn independently from Fi. Suppose that F; is 
strictly increasing and continuous on [0, h;], with F(0) = 0 and F(h;) =1. (hy can 
be oo.) 

(a) If (G1,..-,8n) is a Bayes-Nash equilibrium, then for each agent i: 

(1) The probability of allocation a;(v) is weakly increasing in v. 
(2) The utility ui(v) is a convex function of v, with 


ui(v) = L ai(z)dz + u;(0). 


(3) The expected payment is determined by the allocation probabilities up to a 
constant p;(0): 


pilo) = vaito) — | E 


(b) Conversely, if (B1,..., Bn) is a set of bidder strategies for which and 
hold, or and (2), then for all bidders i and values v and w 


ui(v|v) > ui(wlv). (14.14) 


REMARK 14.6.2. See|Figure 14.6| for an illustration of the theorem statement 
and a “proof by picture”. 


REMARK 14.6.3. This theorem implies that two auctions with the same alloca- 
tion rule yield the same expected payments and auctioneer revenue in equilibrium 
(assuming p;(0) = 0 for all 7), without making any smoothness assumptions. How- 
ever, if bidders valuations are not i.i.d., the equilibrium will not be symmetric (i.e. 
Bi(-) Æ B;(-) for i 4 j), so the first-price auction need not allocate to the highest 
bidder. Thus, the first and second-price auction are not revenue equivalent for 
asymmetric bidders. 


14 A randomized auction is defined as in|Definition 14.1.1] but a; [b] represents the allocation 


[b] is the expected 


probability to bidder į and takes values in [0,1]. In a randomized auction, Y; 
payment of bidder i on bid vector b. |Definition 14.1.2}and|Definition 14.4.1J/remain unchanged. 
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PROOF. (a): Suppose that (81,..., 8n) is a Bayes-Nash equilibrium. In what 
follows, all quantities refer to bidder i, so for notational simplicity we usually drop 
the subscript i. Also, we use the shorthand notation u(v) := u(v|v). 

Consider two possible values that bidder + might have. If he has value v, then 
he has higher utility bidding 6;(v) than 8;(w); i.e., 


u(w|v) = va(w) — p(w) < u(v) = va(v) — p(v). (14.15) 
Similarly, if he has value w, 
wa(w) — p(w) = u(w) > wa(v) — p(v) = u(v|w). (14.16) 
Subtracting from we get 
(v — w)a(w) < u(v) — u(w) < (v — w)a(v). 


Comparing the right and left sides of this inequality shows that a(-) is (weakly) 
increasing. Moreover, the left-hand inequality means that a(w) is a subgradient 


of u(-) at w, as defined in of It then follows from (9-0) of 
that u(-) is convex and satisfies 


u(v) = L a(z)dz + u(0). 


Finally, since u(v) = va(v) — p(v), follows. 
(b): For the converse, from condition (or (2)) it follows that 


whereas id 
u(wlv) = va(w) — plu) = (v= wja(u) + f al2)dz, 
0 
whence, by condition 


u(v) > u(wlv). 


REMARK 14.6.4. Another way to see that u(-) is convex is to observe that 
u(v) = u(v|v) = sup u(w|v) = sup{va(w) — p(w), 


and thus u(v) is the supremum of affine functions. (See|Appendix C]| for more on 
convex functions.) 


EXERCISE 14.c. Extend|/Theorem 14.6.1/to the case of an auction for selling k 


identical items, where each bidder wants only one item. (Other than the fact that 
k items are being sold, as opposed to one, the theorem statement is unchanged.) 


EXAMPLE 14.6.5. Uniform price auction: The industry which sells tickets 
to concerts and athletic events does not necessarily operate efficiently or, in some 
cases, profitably due to complex and unknown demand. This can lead to tickets 
being sold on secondary markets at exorbitant prices and/or seats being unfilled. 
An auction format, known as a uniform price auction, has been proposed in order 
to mitigate these problems. The auction works as follows: Prices start out high 
and drop until the tickets sell out; however, all buyers pay the price at which the 
final ticket is sold. 
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probability of 
allocation 
a;(vi) 
expected utility 
u,(vi|vi) 
value 
probability of probability of 
allocation allocation 


a;(w') 


value 


ae) <a] = wy «| ied] - = ad 


FIGURE 14.6. The top figure illustrates how the expected payment p;(-) and 
utility are determined by the allocation function a;(-) via equation |(14.8) 
The bottom figures shows how a monotone allocation function and payments 
determined by ensures that no bidder has an incentive to bid as if he 
had a value other than his true value. Consider a bidder with value vj. On 
the left side, we see what happens to the bidder’s utility if he bids as if his 
value is w < v;. In this case, us(wl|vi) = viai(w) — pi(w) < us(vilvi). On the 
right side we see what happens to the bidder’s expected utility if he bids as 
if his value is w’ > v;. In this case, u;(w'|vi) = viai(w’) — pi(w’) is less than 
the expected utility ui(v:|v;) he would have obtained by bidding 6(v) by an 
amount equal to the small blue striped area. 


= EXERCISE 14.d. Use the result of the Revenue Equivalence The- 
orem (Corollary [14.4-2) and the equilibrium of the k-unit Vickrey auction to show 
that in a uniform price auction with n bidders and k tickets, where each bidder 
wants one ticket, the following bidding strategy 6(v) is in Bayes-Nash equilibrium: 


alo) =E | max, v; 


i<n—k 


ax Vi < l. 
i<n—k 
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14.7. Price of anarchy in auctions 


Consider an auctioneer selling a single item via a first-price auction. The 
social surplus V(b) of the auctior] when the submitted bid vector is b, is the 
sum of the utilities of the bidders and the auctioneer utility (i.e., revenue). Since 
only the winning bidder has nonzero utility and the auctioneer revenue equals the 
winning bid, we have 


V(b) := value of winning bidder. 


For bidders with i.i.d. values, we derived in |(14.12)| symmetric equilibrium 
strategies. For such equilibria, the winning bidder is always the bidder with the 
highest value; i.e., social surplus is maximized. 

It is more difficult to solve for equilibria when bidders’ values are not i.i.d.: In 
general, this is an open problem. Moreover, equilibria are no longer symmetric. 
When there are two bidders, say bidder 1 with V; ~ U[0,1] and bidder 2 with 
V2 ~ U(0,2], the bidder with the higher value may not win in equilibrium. The 
intuition is that from the perspective of bidder 1, the weaker bidder, the competition 
is more fierce than it was when he faced a bidder whose value was drawn from the 
same distribution as his. Thus, bidder 1 will have to bid a bit more aggressively. 
On the other hand, bidder 2 faces weaker competition than he would in an i.i.d. 
environment and so can afford to bid less aggressively. This suggests that there will 
be valuations vı < v2 for which 81(v1) > 82(v2), and in such a scenario the bidder 
with the higher value will lose the auction. See [Exercise 14.10] 

In this section, without actually deriving equilibrium strategies, we will show 
that the expected social surplus in any Bayes-Nash equilibrium is still within a 
constant factor of optimal. 


THEOREM 14.7.1. Let A be a first-price auction (with an arbitrary tie-breaking 
rule) for selling a single item. Suppose the bidder values (Vi,...,Vn) are drawn from 
the joint distribution F. (The values could be correlated.) Let (81(-),..-,Bn(-)) be 
a Bayes-Nash equilibrium and let V* be the value of the winning bidder. Then 


z (max; Vj] 
5 . 
P P 16 ` : á P 
That is, the price of anarchy" (with respect to social surplus) in BNE is at most 2. 


a [V*] > 


PROOF. For any bid vector b, let u;[b|v;] denote bidder i’s utility when the 
bids are b = (b1,...,b,) and her value is v;. If bidder i bids v;/2 instead of b;, her 
utility will be v;/2 if she wins and 0 otherwise. Thus, 


Ui [= b; 
2 


| 
vi! Z 9 {H>maxpyzi be} 


(14.17)]is an inequality only because of the possibility that v;/2 = max, 4; by and i 
wins the auction; i.e., the auctioneer breaks ties in favor of i. It follows that 


Xou Hz vi > 5 Leo we max Š — max bj. (14.18) 


j 
This latter inequality clearly holds if the right-hand side is negative or 0. If it is 
positive, the inequality follows by considering the summand for which v; is maxi- 
mized. 


(14.17) 


15 Social surplus is also called “social welfare” or “efficiency”. 
16 See Chapter 8}for an introduction to this concept. 
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Setting b; = 6;(V;) in}(14.18)|and taking expectations, we obtain 


2 ane: v)| 2E a — max 8;(Vj) |. (14.19) 


Thus, under Bayes-Nash equilibrium bidding, the social surplus satisfies 


)[V*] = os z [fui (Vil V] +E [max BV) since the revenue is max b; (V;) 
j j 


> ež v)| +E max 80v) since 3 is a BNE 
j 
> = 4 by [(14.19) 


REMARK 14.7.2. This bound of 1/2 on the price of anarchy can be improved 


to 1—1/e. See|Exercise 14.11|for the derivation of this improved bound. 


14.8. The Revelation Principle 


In some auctions, the communication between the bidders and the auctioneer 
is involved; e.g., the English auction, the IPL auction, and the entry-fee auction 
all involve multiple rounds. In most auction formats we’ve seen, however, the 
communication between the bidders and the auctioneer is simple: Each bidder 
submits a single bid. But even when the communication is restricted to a single bid, 
as in a first-price sealed-bid auction, determining the equilibrium bid requires each 
bidder to know the distributions of other bidders’ values and might be complicated 
to compute. 

An extremely useful insight, known as the Revelation Principle, shows that, 
for every auction with a Bayes-Nash equilibrium, there is another “equivalent” 
direcf!"| auction in which bidding truthfully is a Bayes-Nash equilibrium. 


DEFINITION 14.8.1. If bidding truthfully (i.e., 6;(v) = v for all v and i) is a 
Bayes-Nash equilibrium for auction A, then A is said to be Bayes-Nash incentive- 
compatible (BIC). 


Consider a first-price auction A in which each bidder’s value is drawn from a 
known prior distribution F. A bidder that is not adept at computing his equilibrium 
bid might hire a third party to do this for him and submit bids on his behalf. 

The Revelation Principle changes this perspective and considers the bidding 
agents and auction together as a new, more complex, auction A, for which bidding 
truthfully is an equilibrium. 


THEOREM 14.8.2 (The Revelation Principle). Let A be a direct auction 
where {8i}; is a Bayes-Nash equilibrium. Recall Definition|14.1. 1] Then there is 
another direct auction A, which is BIC and has the same winners and payments as 
A in equilibrium; i.e., for all v = (v1,..., Un), if bi = Bi(v;) and b = (b1,...,bn), 
then 


atlb]=a4[v] and PAAlb] = Ary). 


17 A direct auction is one in which each bidder submits a single bid to the auctioneer. 
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A 
V, —— i 
‘ aries 2,(8(Vi)s-+-Ba(Va)] 
: A PABA) -PaVa )] 
V,, —— 7 ee E” : 


FIGURE 14.7. The figure illustrates the proof of the Revelation Principle. In 
the auction A, bidders are asked to report their true values. The auction then 
submits their equilibrium bids for them to the auction A and then outputs 
whatever A would have output on those bids. The auction A is BIC-it is a 
Bayes-Nash equilibrium for the bidders to report their values truthfully. 


PRooF. The auction A operates as follows: On each input v, A computes 
B(v) = (61(01),---,8n(vn)) and then runs A on (v) to compute the output and 
payments. (See [Figure 14.7}) It is straightforward to check that if 6 is a Bayes- 
Nash equilibrium in A, then bidding truthfully is a Bayes-Nash equilibrium for A; 
i.e., A is BIC. 


EXAMPLE 14.8.3. Recall|Example 14.2.1} a first-price auction with two bidders 


with U[0, 1] values. An application of the Revelation Principle to this auction yields 
the following BIC auction: Allocate to the highest bidder and charge him half of 
his bid. 


REMARK 14.8.4. As discussed at the beginning of this section, for some auction 
formats, the actions of the bidder often go beyond submitting a single bid. The 
Revelation Principle can be extended to these more general settings. Given any 
auction format A (that might involve complex interaction of bidders and auction- 
eer), an auction A is constructed as follows: Each bidder truthfully reveals his value 
to a trusted third party who will calculate and implement his equilibrium strategy 
for him. If A includes this third party, then as far as each bidder is concerned, A 
is a direct auction which is BIC. 

For example, an application of the Revelation Principle to the English auction 
yields the Vickrey second-price auction. In some online English auctions, the bidder 
submits to an intermediary (e.g., eBay) a number v which is the maximum price he 
is willing to pay. The intermediary then bids on his behalf in the actual auction, 
only increasing his bid when necessary but never bidding more than v. 

The primary use of the Revelation Principle is in auction design: To determine 
which mappings from values to allocations can arise in Bayes-Nash equilibrium, it 
suffices to consider direct, BIC auctions. 
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14.9. Myerson’s optimal auction 


We now consider the design of the optimal single-item auction, that is, the auc- 
tion that maximizes the auctioneer’s expected revenue over all auctions, in Bayes- 
Nash equilibrium. A key assumption here is that the auction designer knows the 
prior distributions from which the bidders’ values are drawn. 


14.9.1. The optimal auction for a single bidder. Consider a seller with 
a single item to sell where there is only one potential buyer. Suppose also that the 
seller knows that the buyer’s value V is drawn from a distribution F with support 
[0,co). If the seller offers the item to the buyer at a price of p, the buyer will 
purchase it if and only if his value V is at least p, which occurs with probability 
1— F(p). Thus, the auctioneer’s expected revenue will be 


R(p) := p(1 — F(p)). 

DEFINITION 14.9.1. The monopoly reserve price for the distribution F, 
denoted by p* := p* (F), is defined as the price that maximizes auctioneer revenue; 
i.e., 

p* = argmax, R(p). (14.20) 


EXAMPLE 14.9.2. When the buyer’s value is U[0, 1], the expected seller revenue 
for price p is R(p) = p(1 — p), and the monopoly reserve price is p* = 1/2. 


THEOREM 14.9.3. Consider a single buyer with value distribution F in a single- 
item setting. Suppose that the buyer maximizes his expected utility given his value 
and the auction format. Then the maximum expected revenue that can be achieved 
by any auction format is R(p*). This revenue is attained by setting a reserve price 
of p*, the monopoly reserve price (which yields a truthful auction). 


PRooF. By the Revelation Principle (Theorem 14.8.2), we only need to con- 


sider optimizing over direct BIC auctions. Such an auction is defined by a mapping 
afv] that gives the probability of allocation when the reported value is v. Recall 
that a[v] = a(v) must be increasing and that the expected payment is determined 
by the allocation probability via part [(3)]of[Theorem 14.6.1} (We will fix p(0) = 0.) 

Any allocation rule a(v) can be implemented by picking U ~ U0, 1] and allo- 
cating the item if a(v) > U, i.e., by offering the (random) price V = min{w | a(w) > 
U}. This auction is truthful, hence BIC. The resulting allocation probability is a(v) 
and the expected revenue is 


2 [R(¥)] < R(p*) 


by [(14.20) 


REMARK 14.9.4. As a consistency check, observe that with the notation of the 
above proof, the expected payment of a bidder with value v (who buys only if the 
price W is below his value) is 


i [Wlo<y] = a P(w < Y < v) dw = [ew —a(w)] dw, 


in agreement with part [(3)] of Theorem 14.6.1 


EXAMPLE 14.9.5. Suppose that the value V of the bidder is drawn from the 
distribution F(x) = 1-1/2, for x > 1. Then E [V] = oo. However, any price p > 1 
is accepted with probability 1/p, resulting in an expected revenue of 1. Thus the 
maximum expected revenue can be arbitrarily smaller than the expected value. 
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A digression. We can use the result of|Theorem 14.9.3]to bound the expected 


revenue of a seller in the following extensive-form game. 


EXAMPLE 14.9.6 (The fishmonger’s problem). There is a seller of fish and 
a buyer who enjoys consuming a fresh fish every day. The buyer has a private value 
V for each day’s fish, drawn from a publicly known distribution F. 

However, this value is drawn only once; i.e., the buyer has the same unknown 
value on all days. Each day, for n days, the seller sets a price for that day’s fish, 
which of course can depend on what happened on previous days. The buyer can 
then decide whether to buy a fish at that price or to reject it. The goal of the buyer 
is to maximize his total utility (his value minus the price on each day he buys and 0 
on other days), and the goal of the seller is to maximize revenue. How much money 
can the seller make in n days? 


One possible seller strategy is to commit to a daily price equal to the monopoly 
reserve price p*, for a total expected revenue of np* after n days. However, the 
seller has the freedom to adapt the price each day based on the interaction in prior 
days and potentially obtain a higher revenue. For example, if the buyer were to 
buy on day 1 at price p, it might make sense for the seller to set a price higher than 
p on day 2, whereas if the buyer rejects this price, it might make sense to lower the 
price the next day. 

More generally, a seller strategy Sy is a binary tree of prices. For instance, 
if n = 2, the seller has to set a single price for day 1 and two prices for day 2, 
depending on the action of the buyer on day 1. The buyer strategy St is the best 
response to the seller strategy and his own value v. Thus, Sm(S1, v) specifies a 
binary decision (buy or don’t buy) at each node in the seller tree Sy; these decisions 
are chosen to maximize the buyer’s utility given v. 


CLAIM 14.9.7. Let Ry(S7,v) denote the n-day revenue of the seller when she 
uses pure strategy Sq, the buyer’s value is v, and he uses his best response S77(S7, v). 
Then for any Sr, 


2 [Ran (S1, V)| < nR(p*). 


PROOF. The pair of strategies used by the buyer and the seller determine for 
each possible value v of the buyer a sequence (p;(v),a;(v)), for 1 < i < n, where 
pi(v) is the asking price on day i and a;(v) € {0,1} indicates the response of the 
buyer. Thus, 


l [Ra (S V] =E | X a:(V)pi(V) 


l<i<n 


Now use 5S; to construct a single-item, single-round direct auction with 1/n of 
this expected revenue as follows: Ask the buyer to submit his value v. Compute 
Str(St, v) to find (p;(v), a;(v))"_,. Finally, pick a uniformly random day i between 
1 and n, and sell the item to the buyer (at price p;(v)) if and only if a;(v) = 1. 
The resulting mechanism is BIC since the outcome for the buyer is the result of 
simulating his best response to Sj given his value. The resulting expected seller 


revenue is E [R,,(.51, V)|/n which, by|Theorem 14.9.3} is at most R(p*). 


14.9.2. A two-bidder special case. Suppose there are two bidders, where 
bidder 1’s value Vı is known to be exponential with parameter ; and bidder 2’s 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


254 14. AUCTIONS 


value V2 is known to be exponential with parameter 2. How should an auctioneer 
facing these two bidders design a BIC auction to maximize his revenue? 

Let A be an auction where truthful bidding (6;(v) = v for all i) is a Bayes- 
Nash equilibrium, and suppose that its allocation rule is a : R? + R?. Recall that 
afb] := (aı[b], a2[b]), where a;[b] is the probability] that the item is allocated to 
bidder į on bid vector b = (b1, b2) and a;(v;) = E [a;(v;, V_i)]. 

The goal of the auctioneer is to choose a@|-] to maximize 


© [pi(Vi) + p2(V2)]. 


To understand this expression, fix one of the bidders, say i, and let a;(v), u:(v), 
and p;(v) denote his allocation probability, expected utility and expected payment, 
ee given that V; = v; and both bidders are bidding their values. Using 


condition (3) from |Theorem 14.6. K we have 
z [ui(V;)] = f J a;(w) dw ye" *” dv. 
o Jo 


Reversing the order of integration, we get 


tual = fo a;(w) m redo dw = f a;(w)e*” dw. (14.21) 


Since u;(v) = va;(v) — pi(v), we obtain 


D [pi(V;)] = f aa i du — f ” ai(w)e-™™ dw 


= 1 
= f ai(v) b — | die” dv. (14.22) 
0 ri 


Letting m; := 1/A; (the mean of the exponential distribution), we conclude that 


2 [p1 (V1) + p2(V2)] = E [a1 (V1) (Vi — mı) + a2(V2) (V2 — m2)] 


= J f [ex (v) (v1 — m1) + a2(v) (v2 — ma)| ArAge” 1% 422 dy; dug. (14.23) 
o Jo 


Thus, to maximize his expected revenue, the auctioneer should maximize this quan- 
tity subject to two constraints: (1) For each v, the item is allocated to at most one 
bidder, i.e., ai(v) + a2(v) < 1, and (2) the auction is BIC. 

With only the first constraint in mind, on bid vector v, the auctioneer should 
never allocate if vı < mı and vg < m2, but otherwise, he should allocate to the 
bidder with the higher value of v; — m;. (We can ignore ties since they have zero 
probability.) 

It turns out that following this prescription (setting the payment of the winner 
to be the threshold bid”) for winning), we obtain a truthful auction, which is there- 
fore BIC. To see this, consider first the case where A, = Ag. The auction is then a 


18 The randomness here is in the auction itself. 

19 Throughout this section, we assume individual rationality; that is, p;(0) = 0 and hence 

20 The threshold bid is the infimum of the bids a bidder could submit and still win the 
auction. 
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Vickrey auction with a reserve price of mı. If Ay Æ A2, then the allocation rule is 


1 if bi — mj ,b—i — M-i); 
; i is > max(0 mi) (14.24) 
0, otherwise. 


ai (b1, b2) -f 


If bidder 1 wins, he pays mı + max(0, b2 — m2), with a similar formula for bidder 2. 


EXERCISE 14.e. Show that this auction is truthful; i.e., it is a dominant strategy 
for each bidder to bid truthfully. 


REMARK 14.9.8. Perhaps surprisingly, when A; Æ A2, the item might be allo- 
cated to the lower bidder. See also [Exercise 14.14 


14.9.3. A formula for the expected payment. Fix an allocation rule af] 
and a specific bidder with value V that is drawn from the density f(-). As usual, let 
a(v), u(v), and p(v) denote his allocation probability, expected utility, and expected 
payment, respectively, given that V = v and all bidders are bidding truthfully. 


Using condition (3) from/Theorem 14.6.1] we have 
: fu(V)] = | | nedar di: 
o Jo 


Reversing the order of integration, we get 


ibog = F aw) f Hwi 
= f° on- Flw)) a (14.25) 


Thus, since u(v) = va(v) — p(v), we obtain 


p= f * sje ane 7 ” a(w)(1— F(w)) eo 


= i de) b- Se] Fo) de: (14.26) 


REMARK 14.9.9. The quantity in the square brackets in |(14.26)| is called the 
bidder’s virtual value. Contrast the expected payment in|(14.26)| with the expecta- 


tion of the value allocated to the bidder using a(-), that is, 


[ alv) v f(v) dv. 


The latter would be the revenue of an auctioneer using allocation rule a(-) in a 
scenario where the buyer could be charged his full value. 

The difference between the value and the virtual value captures the auctioneer’s 
loss of revenue that can be ascribed to a buyer with value v, due to the buyer’s 
value being private. 


14.9.4. The multibidder case. We now consider the general case of n bid- 
ders. The auctioneer knows that bidder i’s value V; is drawn independently from 
a strictly increasing distribution F; on [0,h] with density f;. Let A be an auction 
where truthful bidding (8;(v) = v for all i) is a Bayes-Nash equilibrium, and sup- 
pose that its allocation rule is æ : R” 1 R”. Recall that a[v] := (ai[v],...,an[Vv]), 
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where a;[v] is the probability)” |that the item is allocated to bidder i on bid”"| vector 
v = (U1,---,Un) and a;(v;) = E [a;(v;, V_«)]- 
The goal of the auctioneer is to choose a@[-] to maximize 


j Eewo . 


DEFINITION 14.9.10. For agent i with value v; drawn from distribution F;, the 
virtual value of agent i is 


iUi) = Vv — ST 7 
In the example of|§14.9.2| w;(v;) = vi — 1/Ai. 


In|§14.9.3| we proved the following proposition: 


LEMMA 14.9.11. The expected payment of agent i in an auction with allocation 
rule a(-) is 


i [pi(Vi)] = E [ai(Vi)di(Vi)].- 


Summing over all bidders, this means that in any auction, the expected 
auctioneer revenue is the expected virtual value of the winning bidder. 
Note, however, that the auctioneer directly controls a(v) rather than a;(v;) = 
2 [a(u;, V_i)]. Expressing the expected revenue in terms of a(-), we obtain 


y pam 


=E £ a; (Vi): (V1) 


i 


-f [Eao] FAC ee CM eee ee 


(14.27) 


Clearly, if there is a BIC auction with allocation rule a|-] that maximizes 


by ai(v) vi (v4) (14.28) 


for every v, then it maximizes expected revenue |(14.27)| A key constraint on 
a|] is that 0,a;(v) < 1. To maximize momar pilvi) < 0, then we 
should set a;(v) = 0 for all i. Otherwise, we should allocate to the bidder with the 
highest virtual value (breaking ties by reported value, for instance). This discussion 


suggests the following auction: 


DEFINITION 14.9.12. The Myerson auction for distributions with strictly in- 
creasing virtual value functions is defined by the following steps: 


(i) Solicit a bid vector b from the agents. 
(ii) Allocate the item to the bidder with the largest virtual value 7,(b;) if 
positive, and otherwise, do not allocate. That i£] 


21 The randomness here is in the auction itself. 

22 Note that since we are restricting attention to auctions for which truthful bidding is a 
Bayes-Nash equilibrium, we are assuming that b = (b1,...,bn) = v. 

23 Break ties uniformly at random. Ties have 0 probability. 
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1, if w(b;) > Maxj4i w(b;) and (bi) = 0; 


: (14.29) 
0, otherwise, 


ai(bi, b-i) = 
(iii) If the item is allocated to bidder i, then she is charged her threshold bid 
t,(b_;), the minimum value she could bid and still win, i.e., 


THEOREM 14.9.13. Suppose that the bidders’ values are independent with strictly 
increasing virtual value functions. Then the Myerson auction is optimal; i.e., it 
maximizes the expected auctioneer revenue in Bayes-Nash equilibrium. Moreover, 
bidding truthfully is a dominant strategy. 


PROOF. The truthfulness of the Myerson auction is similar to the truthfulness 
of the Vickrey auction. Bidder it’s utility in the auction is bounded above by 
max(v; — t.(b_;),0), where the first term is the utility for winning and the second 
term is the utility for losing. This maximum is achieved by bidding truthfully 
as long as 7;(-) is strictly increasing, since v; > t.(b_;) if and only if w;(v;) > 
max(0, {4z (bj) }jzi)- 

Since the auction is truthful, it is also BIC. Optimality follows from the dis- 
cussion after |(14.27) 


COROLLARY 14.9.14. The Myerson optimal auction for i.i.d. bidders with strictly 
increasing virtual value functions is the Vickrey auction with a reserve price of 


y*(0). 


EXERCISE 14.f. Show that the virtual value function for a uniform distribution 
is strictly increasing. Use this to conclude that for bidders with i.i.d. U[0, 1] values, 
the Myerson auction is a Vickrey auction with a reserve price of 1/2. 


REMARK 14.9.15. The fact that truthfulness in the Myerson auction is a dom- 
inant strategy means that the bidders do not need to know the prior distributions 
of other bidders’ values. Other BIC auctions with the same allocation rule will also 
be optimal, but truthfulness need not be a dominant strategy. See [Exercise 14.9] 


for an example. 


REMARK 14.9.16. The Myerson auction of Definition [14.9.12] can be general- 
ized to the case where virtual valuations are weakly increasing. Step (i) remains 
unchanged. In step (ii), a tie-breaking rule is needed. To keep the auction BIC, it 
is crucial to use a tie-breaking rule that retains the monotonicity of the allocation 
probabilities a;(-). Three natural tie-breaking rules are 


e break ties by bid (and at random if there is also a tie in the bids); 
e break ties according to a predetermined fixed ordering of the bidders, and 
e break ties uniformly at random (equivalently, assign a random ranking to 
the bidders). 
The resulting payment in step (iii) is still the threshold bid, the lowest bid the 
winner could have made without changing the allocation. See 


COROLLARY 14.9.17. Consider n bidders with independent values and strictly 
increasing virtual value functions. In the class of BIC auctions that always allocate 
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the item, i.e., have >>; a;(b) = 1 for all b, the optimal (revenue-maximizing) auc- 
tion allocates to the bidder with the highest virtual value. If this is bidder i, he is 
charged Y7 * [max;4; Y; (b;)]. 

PROOF. This follows from|Lemma 14.9.11}and the proof of truthfulness of the 


Myerson auction. 


14.10. Approximately optimal auctions 


14.10.1. The advantage of just one more bidder. One of the downsides 
of implementing the optimal auction is that it requires that the auctioneer know the 
distributions from which agents’ values are drawn (in order to compute the virtual 
values). The following result shows that in lieu of knowing the distribution from 
which n i.i.d. bidders are drawn, it suffices to recruit just one more bidder into the 


auctior24] 


THEOREM 14.10.1. Let F be a distribution for which virtual valuations are 
increasing. The expected revenue in the optimal auction with n i.i.d. bidders with 
values drawn from F is upper bounded by the expected revenue in a Vickrey auction 
with n+ 1 i.i.d. bidders with values drawn from F. 


ProoF. By|Corollary 14.9.17| in the i.i.d. setting, the Vickrey auction maxi- 


mizes expected revenue among BIC auctions that always allocate. 

Next, observe that one possible (n+ 1)-bidder auction that always allocates the 
item consists of, first, running the optimal auction with n bidders and then, if the 
item is unsold, giving the item to bidder n + 1 for free. 


14.10.2. When only the highest bidder can win. Consider the scenario 
in which an item is being sold by auction to one of n possible buyers whose values 
are drawn from some joint distribution, known to the auctioneer. We saw that for 
independent values, the optimal auction might reject the highest bid and allocate 
to another bidder. In some settings, this is prohibited, and the auctioneer can only 
allocate the item to the highest bidder (or not at all); e.g., he can run a Vickrey 
auction with a reserve price. The following auction called the Lookahead auction 
maximizes expected revenue in this class of auctions. 


(i) Solicit bids from the agents. Suppose that agent i submits the highest bid 

b;. (If there are ties, pick one of the highest bidders arbitrarily.) 

(ii) Compute the conditional distribution F; of V; given the bids b_; and the 
event V; > max;z;b;. Let p; = p;(b_;) be the price p that maximizes 
p(t — Š). 

(iii) Run the optimal single-bidder auction with agent 7, using his previous bid 
b; and the distribution F; for his value: This auction sells the item to 
agent i at price p; if and only if b; > pi. 


REMARK 14.10.2. The Lookahead auction can be implemented even when bid- 
ders’ values are not independent. The only difference is in the update stage- 
computing F; is more involved. See|Example 14.10.6 


24 under the questionable assumption that this bidder has the same value distribution as 


the others. 
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PROPOSITION 14.10.38. The Lookahead auction is optimal among truthful auc- 
tions that allocate to the highest bidder (if at all). 


PROOF. Any truthful auction A, after conditioning on b; being the highest bid 
and the values b_;, becomes a truthful single bidder auction. Optimizing expected 


revenue in that auction, by|Theorem 14.9.3} yields the Lookahead auction. 


14.10.3. The Lookahead auction is approximately optimal. As we know, 
the Lookahead auction is not the optimal truthful auction. However, the next the- 
orem shows that it is a factor two approximation. 


THEOREM 14.10.4. The Lookahead (LA) auction yields an expected auctioneer 
revenue that is at least half that of the optimal truthful and ex-post individually 
rational auction even when bidders have dependent values. 


PROOF. The expected revenue of an individually rational, truthful auction is 
the sum of its expected revenue from bidders that are not the highest, say L, plus 
its expected revenue from the highest bidder, say H. 

Assuming truthful bidding, it is immediate that the expected revenue of the 
LA auction is at least H since it is the optimal auction for this bidder conditioned 
on being highest and conditioned on the other bids. As for the other bidders, since 
the auction is ex-post individually rational, no bidder can be charged a price higher 
than her value. Thus, the optimal revenue achievable from these lower bidders is at 
most the maximum value of bidders in this set, say v* (since only one item is being 
sold). But the expected revenue from the highest bidder is at least v* since one 
of the possible auctions is to just offer him a price of v*. Therefore the expected 
revenue of the LA auction is also at least L. 


REMARK 14.10.5. Note that|Theorem 14.10.4/holds for independent values even 


if virtual valuations are not increasing! 


EXAMPLE 14.10.6. Two gas fields are being sold as a bundle via a Lookahead 
auction. X and Y are the profits that bidder 1 can extract from fields 1 and 2. 
Bidder 2 is more efficient than bidder 1, so he can extract 2X from the first but 
can’t reach the second. Thus 


Y=X+Y and W=2X. 


It is known to the auctioneer that X and Y are independent and distributed expo- 
nentially with parameter 1. 
There are two cases to consider since P(X = Y) = 0. 

e Və < Vı or equivalently Y > X: Given this event and V2 = bo, the 
conditional distribution of Y is that of X + Z, where Z is an independent 
exponential with parameter 1. Thus, F; is the distribution of b2 + Z. The 
price pı > bo that maximizes 


p- P(b2 +Z >p) = p: e7? 


is max(b2,1). Thus, if bı > max(b2,1), then in step (iii) the item will be 
sold to bidder 1 at the price max(b2,1) Notice that if 1 > bı > ba, then 
the item will not be sold. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


260 14. AUCTIONS 


e Və > Vi or equivalently Y < X: Given this event and Vj = bı, the 
conditional distribution of X is unifornf?)| on [b1/2, bı]. Therefore, Fy is 
uniform on [b1, 2b1]. Thus, if b2 > bı, then the item will be sold to bidder 
2 at price bı, since p(1 — Fy(p)) is decreasing on [b1, 2b1]. 


14.11. The plot thickens... 


We now briefly consider other settings, where the optimal auctions are surpris- 
ing or weird. 


EXAMPLE 14.11.1. (Single bidder, two items:) Suppose the seller has two 
items and there is a buyer whose private values (V1, V2) for the two items are known 
to be independent samples from distribution F. Suppose further that the buyer’s 
value for getting both items is V; + V2. Since the values for the two items are 
independent, one might think that the seller should sell each of them separately 
in the optimal way, resulting in twice the expected revenue from a single item. 
Surprisingly, this is not necessarily the case. Consider the following examples: 

(1) Suppose that each V; is equally likely to be 1 or 2. Then the optimal 
revenue the seller can get separately from each item is 1: If he sells an 
item at price 1, the buyer will buy it. If he sells at price 2, the buyer will 
buy with probability 1/2. Thus, selling separately yields a total expected 
seller revenue of 2. However, if the seller offers the buyer the bundle of 
both items at a price of 3, the probability the buyer has Vi + V2 > 3 is 
3/4, and so the expected revenue is 3-3/4 = 2.25, more than the optimal 
revenue from selling the items separately. 


(2) On the other hand, suppose that V; is equally likely to be 0 or 1. Then if 
each item is sold separately, the optimal price is 1 and the overall expected 
revenue is 2-1/2 = 1. On the other hand, the optimal bundle price is 1, 
and the expected revenue is only 3/4. 


(3) When V; is equally likely to be 0, 1, or 2, then selling the items sepa- 
rately yields expected revenue 4/3. That is also the expected revenue 
from bundling. However, the auction which offers the buyer the choice 
between any single item at price 2 or the bundle of both items at price 3 
obtains an expected revenue of 13/9. 


EXAMPLE 14.11.2. Revenue when values are correlated: Earlier, we con- 
sidered the optimal auction when there are two bidders whose values are uniform 
on [0,1]. In this case, the expected value of the highest bidder is 2/3, and yet the 
auction which maximizes the seller’s revenue obtains an expected revenue of only 
5/12. This loss is the “price” the auctioneer has to pay because the values of the 
bidders are private[?)] However, when bidders’ values are correlated, the auctioneer 
can take advantage of the correlation to extract more revenue, in some cases, the 
full expected maximum value! 


25 Given the sum X +Y of two iid. exponentials, the conditional distribution of each one 
is uniform on [0, X + Y]. 
26 This is sometimes called the “information rent” of the bidder. 
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For example, suppose that there are two agents and they each have value either 
10 or 100, with the following joint distribution. (The entries in the table are the 
probabilities of each of the corresponding pairs of values.) 


V2 =10 V= 100 


V=10 | 1/3 1/6 


Vi =100] 1/6 1/3 


We consider the following (symmetric) auction, from the perspective of bidder 1: 
The initial allocation and pricing is determined by a second-price auction yielding 


u,(10) = 0, (14.31) 
1 
u1(100) = P [V2 = 10|V1 = 100] - (100 — 10) = 3" 90 = 30, (14.32) 
since whenever both bidders are truthful and submit the same bid, they obtain a 


utility of 0. However, the buyers must commit up-front to the following additional 
rules: 


e If V3 = 10, then bidder 1 will receive $30+e. 
e If V2 = 100, then bidder 1 will be charged $60—e. 


(The symmetric rule is applied to bidder 2.) The impact of these rules is to reduce 
bidder 1’s expected utility to 0: If her value is $10, then the payoff from the extra 
rules is 
P [V2 = 10|V1 = 10] - (30 + €) — P [V2 = 100|V, = 10] - (60 — €) 
2 1 
“13 3 
whereas if her value is $100, the payoff from the extra rules is 
P [V2 = 10|V1 = 100] - (30 + €) — P [V2 = 100|V, = 100] - (60 — €) 
1 2 


Combining |(14.31)| and |(14.33)| for the case V; = 10 and combining |(14.32)| and 


[(14.34)] for the case Vı = 100 shows that her combined expected utility from the 
second-price auction and the extra rules is always e. Since the setting is symmetric, 
bidder 2’s expected utility is also e. 

Finally, since 


2e = u(Vi) + u(V2) = E [(Via(Vi) — pı (V1)) + (V2a(V2) — p2(V2))] 


and the allocation is always to the bidder with the highest value, the auctioneer’s 
expected revenue is 


(30 + €) — = - (60 — €) = €, (14.33) 


(60 — €) = e — 30. (14.34) 


y [pı (vı) + p2(V2)] =E [max(Vı, V2)] — 2€, 


essentially the maximum possible (in expectation). 

In addition, truth-telling is a Bayes-Nash equilibrium. However, a bidder may 
be very unhappy with this auction after the fact. For example, if both agents have 
value 100, they both end up with negative utility (—60 + €)! 
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REMARK 14.11.3. In this example, the buyer’s ex-interim expected utility (that 
is, his expected utility after seeing his value, but before seeing other bidders’ val- 
ues) is positive. However, after the auction concludes, the buyer’s utility may be 
negative. In other words, this auction is not ex-post individually rational. Most 
commonly used auctions are ex-post individually rational. 


Notes 


There are several excellent texts on auction theory, including the books by Kr- 
ishna [Kri09], Menezes and Monteiro [MM05], and Milgrom [Mil04]. Other sources that 
take a more computational perspective are the forthcoming book by Hartline [Har12], the 
lectures notes by Roughgarden (Lectures 2-6), and Chapters 9 and 13 in [Nis07]. 

The first game-theoretic treatment of auctions is due to William Vickrey [Vic61], 
who analyzed the second-price auction and developed several special cases of the Revenue 
Equivalence Theorem. These results played an important role in Vickrey’s winning the 
1996 Nobel Prize, which he shared with James Mirrlees “for their fundamental contribu- 
tions to the economic theory of incentives under asymmetric information.” 


William Vickrey Roger Myerson 


The Revenue Equivalence Theorem was proved by Myerson and Riley and 
Samuelson [RS81}. Myerson’s treatment was the most general: He developed the Rev- 
elation Principl¢?" from [§14.8] and optimal (revenue-maximizing) auctions for a number 
of different settings. For this and “for having laid the foundations of mechanism design 
theory,” Roger Myerson, Leonid Hurwicz, and Eric Maskin won the 2007 Nobel Prize in 
Economics. 

The Revelation Principle also applies to equilibrium concepts other than Bayes-Nash 
equilibrium. For example, the Revelation Principle for dominant strategies says that if A 
is an auction with dominant strategies {8;}7_,, then there is another auction A for which 
truth-telling is a dominant strategy and which has the same winner and payments as A. 
The proof is essentially the same. 

In we developed Myerson’s optimal auction for the case where virtual valua- 
tions are (weakly) increasing. Myerson’s paper solves the general case. These 
results were further developed and clarified by Bulow and Roberts and Bulow and 
Klemperer [BK94]. The approximately optimal auction from[§14.10.1]is from [BK94]; the 
proof given here is due to Kirkegaard [Kir06]. The Lookahead auction in [§14.10.2] is due 


27 According to Myerson |Mye12], the Revelation Principle was independently discovered by 
others including Dasgupta, Hammond, Maskin, Townsend, Holmstrom, and Rosenthal, building 
on earlier ideas of Gibbard and Aumann. 
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to Ronen [Ron01]. The examples in [§14.11] are taken from Hart and Nisan and 
Daskalakis et al. [DDT12]. Surprisingly, there are situations where the optimal mecha- 
nism may not even be deterministic [DDT12]. However, Babaioff et al. 
show that in the setting of a single additive buyer, with a seller with multiple items for 
sale, if the bidders’ valuations for the different items are independent, then the better of 
selling separately and selling the grand bundle achieves expected revenue within a con- 
stant factor of optimal. For other recent developments related to optimal auctions, see 
e.g., [CDW16]. 


The war of attrition and generalizations thereof have been studied extensively. See, 


e.g., BK99|. A different type of war of attrition is shown in|Figure 14.8 


FIGURE 14.8. Dawkins describes the behavior of emperor penguins 
in the Antarctic. They have been observed standing on an ice ledge hesitating 
before diving in to catch their dinner because of the danger of a predator 
lurking below. Eventually, one of them, the loser of this game, jumps in and 
(if he’s not eaten), the others follow. 


Example|14.11.2/is adapted from and |CM88]. Cremer and McLean |CM88 


showed that full surplus extraction is possible in a broad range of correlated settings. 
Uniform price auctions (Example[14.6.5) have been applied to the sale of sports tickets by 
Baliga and Ely [Tral4]. The generalization to the case where individual bidders demand 
multiple units is discussed in [Kri09]. 

Example[14.9.6]and other variants of the fishmonger problem are studied in Hart and 
Tirole [HT88], as well as [DPS15]. These papers focus on the case where the seller 
cannot commit to future prices. 

a fom KIEL follows and is 
from [Syr12|. As discussed in the notes of the price of anarchy in games of 
incomplete information and auctions has been extensively studied in recent years. For a 
detailed treatment, see, e.g., [Har12]. 

Many of the results developed in this chapter, e.g., Myerson’s optimal auction apply 
to settings much more general than single-item auctions, e.g., to the win/lose settings 


discussed in|§15.2| We refer the reader to Klemperer’s excellent guide to the literature on 
Kle99a| 


auction theory [Kle99a] for further details and additional references. 
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Exercises 


14.1. Show that the three-bidder, single-item auction in which the item is allo- 
cated to the highest bidder at a price equal to the third highest bid is not 
truthful. 


14.2. Consider a Vickrey second-price auction with two bidders. Show that for 
each choice of bidder 1’s value vı and any possible bid bı # vı he might 
submit, there is a bid by the other bidder that yields bidder 1 strictly less 
utility than he would have gotten had he bid truthfully. 


14.3. Suppose that each agent’s value V; is drawn independently from the same 
strictly increasing distribution F € [0, h]. Find the symmetric Bayes-Nash 
equilibrium bidding strategy in 

e a second-price auction with a reserve price of r, 
e a first-price auction with a reserve price of r, 
e an all-pay auction with a reserve price of r. 


14.4. Consider a descending auction for a single item. The auctioneer starts at 
a very high price and then lowers the price continuously. The first bidder 
who indicates that he will accept the current price wins the auction at that 
price. Show that this auction is equivalent to a first-price auction; i.e., any 
equilibrium bidding strategy in the first-price auction can be mapped to 
an equilibrium bidding strategy in this auction and will result in the same 
allocation and payments. 


S 14.5. Find a symmetric equilibrium in the war of attrition auction discussed in 
under the assumption that bids are committed to up-front, rather 
than in the more natural setting where a player’s bid (the decision as to 
how long to stay in) can be adjusted over the course of the auction. 


14.6. Consider|Example 14.5.2/again. Suppose that the bidder’s values are inde- 


pendent, but not identically distributed, with V; ~ F;. Find the revenue- 
maximizing evaluation fee for the seller (assuming that a second-price auc- 
tion will be run). The same evaluation fee must be used for all bidders. 


14.7. Prove that the single-item, single-bidder auction described in |§14.9.llis a 
special case of Myerson’s optimal auction. 


14.8. (a) Show that the Gaussian and the equal-revenue (F(x) = 1 — 1/2 for 
x > 1) distributions have increasing virtual value functions. 
(b) Show that the following distribution does not have an increasing virtual 
value function: Draw a random variable that is U[0,1/2] with probability 
2/3 and a random variable that is U[1/2,1] with probability 1/3. 
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14.9. Consider the following two-bidder, single-item auction: Allocate to the 
highest bidder if his bid b is at least 1/2, and if so, charge him (b/2)+(1/80). 
Otherwise, don’t allocate the item. Show that for two bidders with i.i.d. 
U0, 1] values, this auction is BIC. 


14.10. Show that the Bayes-Nash price of anarchy for a first-price auction is 
strictly larger than 1 for two bidders, where bidder 1 has value Vı ~ U[0, 1] 
and V2 ~ U(0, 2]. See [§14.7] Hint: Suppose that the higher bidder always 
wins and then apply the revenue equivalence. 


14.11. Under the conditions of|Theorem 14.7.1] show that 


D[V*] > (1- 1) t [max Vi]. 


See for a hint. 


14.12. Show that if the auctioneer has a value of C for the item, i.e., his profit in 
a single-item auction is the payment he receives minus C (or 0 if he doesn’t 
sell), then with n i.i.d. bidders (with strictly increasing virtual valuation 
functions), the auction which maximizes his expected profit is Vickrey with 
a reserve price of ~~ 1(C). 


S 14.13. Determine the explicit payment rule for the three tie-breaking rules dis- 
cussed in|Remark 14.9.16 


S 14.14. Consider two bidders where bidder 1’s value is drawn from an exponential 
distribution with parameter 1 and bidder 2’s value is drawn independently 
from U[0, 1]. What is the Myerson optimal auction in this case? Show that 
if (v1, v2) = (1.5, 0.8), then bidder 2 wins. 


14.15. Consider n bidders, where V; ~ F;, where the V;’s are independent and 
virtual values are increasing. Let rf be the monopoly reserve price for F; 
(recall (14.20)). Show that the following auction obtains at least half the 
revenue obtained by the optimal truthful and ex-post individually rational 
auction. 
e Ask the bidders to report their values. 
e If the highest bidder, say bidder 1, reports bı > rj, then he wins the 
auction at a price equal to max(b,r}), where bə is the report of the 
second-highest bidder. Otherwise, there is no winner. 


14.16. Show how to generalize/Theorem 14.10.4|to a scenario in which k identical 


items are being sold. 


14.17. Show that if bidders values are i.i.d. from a regular distribution F, then 
the expected revenue of the first-price auction with reserve price ~~ 1(0) is 
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the same as that of the Myerson optimal auction. 


14.18. Consider the following game known as the Wallet Game: Each of two 
bidders privately checks how much money she has in her wallet, say s; for 
bidder i (i = 1,2). Suppose an English auction is run where the prize is 
the combined contents of the two wallets. That is, the price goes up con- 
tinuously until one of the bidders quits, say at price p. At that point, the 
remaining bidder pays p and receives sı + s2. Find an equilibrium in this 
game. Can you find an asymmetric equilibrium? 


14.19. Show that the lookahead auction does not obtain a better than 2 approxi- 
mation to the optimal auction. 


14.20. Consider an auctioneer selling an item by auction to buyers whose valua- 
tions Vi, V2,..., Vn are drawn from a correlated joint distribution F (not 
a product distribution). In this case, the characterization of Bayes-Nash 


equilibrium (Theorem 14.6.1) does not hold. Explain where the proof given 


there breaks down. 


14.21. Prove that|Theorem 14.6.1| holds if A is an auction for selling k items. In 


fact, prove that it holds in any win/lose setting when each agent’s value V; 
is drawn independently from F;. For a definition of win/lose settings, see 


14.22. Consider an auction in which k identical items are being sold. Each of n 
bidders is interested in only one of these items. Each bidder’s value for the 
item is drawn independently from the same prior distribution F. Use the 
result of the previous exercise to derive the optimal auction (i.e., general- 


ize|Theorem 14.9.13) for this setting, assuming that (v) = v — FO) ig 


strictly increasing. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


CHAPTER 15 


Truthful auctions in win/lose settings 


In a truthful auction, bidders do not need to know anything about their com- 
petitors or perform complex calculations to determine their strategy in the auction. 
In this chapter, we focus exclusively on auctions with this property. 


15.1. The second-price auction and beyond 


EXAMPLE 15.1.1 (Spectrum auctions). The government is running an auc- 
tion to sell the license for the use of a certain band of electromagnetic spectrum. 
Its goal is to allocate the spectrum to the company that values it most (rather than 
maximizing revenue). This value is indicative of how efficiently the company can 
utilize the bandwidth. One possibility is to run an English auction - we have seen 
that it is a dominant strategy for bidders to stay in the auction as long as the price 
is below their value. This ensures that the license is sold to the bidder with the 
highest value. A key difference between the English auction and the second-price 
auction is that in the latter, the highest bidder reveals his value. This could be 
damaging later; e.g., it could lead to a higher reserve price in a subsequent auc- 
tion. Thus an auction that is truthful in isolation might not be truthful if the same 
players are likely to participate in a future auction. 


FIGURE 15.1. Competing rocket companies. 


267 
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EXAMPLE 15.1.2 (Hiring a contractor). Several rocket companies are com- 
peting for a NASA contract to bring an astronaut to the space station. One pos- 
sibility is for NASA to run a second-price procurement auction (where the 
auctioneer is a buyer instead of a seller): Ask each company to submit a bid, and 
award the contract to the lowest bidder, but pay him the second-lowest bid. The 
utility of the winning bidder is the price he is paid minus his cos} Again, it is 
a dominant strategy for each company to bid its actual cost. Alternatively, to re- 
duce the amount of information the winning company reveals, NASA can run a 
descending auction: Start from the maximum price NASA is willing to pay and 
keep reducing the price until exactly one company remains, which is then awarded 
the contract at that price. It is a dominant strategy for each company to stay in 
until the price reaches its cost. 


EXAMPLE 15.1.3 (A shared communication channel). Several users in a 
large company have data streams to transmit over a shared communication channel. 
The data stream of each user has a publicly known bandwidth requirement, say wi, 
and that user has a private value v; for getting his data through. If the total 
capacity of the channel is C, then the set of data streams selected must have a 
total bandwidth which does not exceed the channel capacity. I.e., only sets of data 
streams S with Le g Wi < C are feasible. Suppose that the company decides to 
use an auction to ensure that the subset of streams selected has the largest total 
value. To do so requires incentivizing the users to report their values truthfully. 


15.2. Win/lose allocation settings 


We now formalize a setting which captures all of these examples. 


DEFINITION 15.2.1. a win/lose allocation problent’| is defined by: 


e A set U of participants/bidders, where each has a private value v; for 
“winning” (being selected) and obtains no value from losing; 
e a set of feasible allocations (i.e., possible choices for the set of winning 
bidders) £ C 24. In a single-item auction, £ contains all subsets of 
size at most 1 and in the communication channel example, £L = {5 C 
U | dies Wi <C}. 
A sealed-bid auction, or mechanism, A in such a setting asks each bidder to 
submit a bid b;. The mechanism then selects (possibly randomly) a single winning 
set L in £ and a payment vector (pi)icu, where p; is the payment] required from 
bidder 7. The mapping from bid vectors to winning sets is called the allocation 
rule and is denoted by afb] = (ai[b],...,a,[b]), where a,[b] is the probability 
that bidder ¢ wins when the bid vector is b = (b1,...,b,). (If the auction is 
deterministic then a;[b] € {0,1}.) We denote the payment rule of the auction 
by p[b] = (p1[b],...,pn[b]), where p;[b] is the expected payment] of bidder i when 
the bid vector is b. (This expectation is taken over the randomness in the auction.) 


1 This cost incorporates the bidders’ time and effort. 
2 This setting commonly goes under the name “single-parameter problem”. 
3 Unless otherwise specified, the payment of each losing bidder is 0. 


4 The quantity p[b] = (pi[b],...,pn[b]) was denoted by Alb] = (Ai[b],..., An[b]) in 
inition 14.1.1]of the previous chapter. 
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Suppose that bidder 7 has private value v; for winning. Then his utility on bid 
vector b is 
The utility of the auctioneer is his income: 
> Pilb]. 
icU 
DEFINITION 15.2.2. We say that a mechanism M is truthful if for every agent, 
it is a dominant strategy to bid his value. Formally, for all b_;, all 2, v;, and b; 
uilvi, b—ilvi] > uilbi, b_z|vi], 
where Ui [b;, b_;|v,] = Uji} [b] — Pi [b]. 


15.3. Social surplus and the VCG mechanism 


DEFINITION 15.3.1. Consider an auction with n bidders where the private value 
of bidder i is v;. Suppose that the set of winning bidders is L* and the payment 
of bidder 7 is p; for each 7. The social surplus of this outcome is the sum of the 
utilities of all the bidders and the auctioneer; that is, 


5 (vily: winner} —_ pi) + Soi = 5 Ui. 
icU icU icL* 
Since the payments are “losses” to the bidders and “gains” to the auctioneer, 
8 
they cancel out.) 


The VCG mechanism for maximizing social surplus in win/lose settings 


e Ask each bidder to report his private value v; (which he may or may not 
report truthfully). Assume that bidder i reports bi. 

e Choose as the winning set a feasible L € £ that maximizes b(L), where 
b(L) = dyer bj. Call this winning set L*. 

e To compute payments, let L} = {S|S U {i} € £L}. Then i only wins if his 
bid b; satisfies 

bi + max b(L) > max b(L), (15.1) 
Lect LEL- 

where £7 is the collection of sets in £ that do not contain i. His payment 
is his threshold bid, the minimum b; for which |(15.1)|holds; i.e., 


= b(L) — b(L). 15.2 
aa F aaa (15.2) 


The payment given by |(15.2)| precisely captures the externality imposed by 
bidder i, i.e., the reduction in total (reported) values obtained by the other bidders 
due to 2’s presence in the auction. 


THEOREM 15.3.2. The VCG mechanism is truthful. Moreover, with truthful 
bidding, O < p; < vi for alli and therefore the auction is individually rational. 


PRooF. Fix any set of bids b_; of all bidders but i and note that p; is deter- 
mined by b_;. Whatever b; is, player 2’s utility is at most v; — p; if v; > p; and at 
most 0 if v; < p;. Bidding truthfully guarantees a bidder 7 this maximum utility of 
max(0, v; — p;i) > 0. 
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REMARK 15.3.3. Note that if payments were instead equal to p; = p;+h;(b_;), 
for any function h;(-), the mechanism would still be truthful. However, it would no 
longer be true that each bidder would be guaranteed nonnegative utility. 


EXERCISE 15.a. Check that the Vickrey second-price auction and the Vickrey 
k-unit auction are both special cases of the VCG mechanism. 


REMARK 15.3.4. Since the VCG auction incentivizes truth-telling in dominant 
strategies, henceforth we assume that bidders bid truthfully. Therefore, we will 
not refer to the bids using the notation b = (b;,...,6,), but rather will assume 
they are v = (v1,..., Un). 


15.4. Applications 


15.4.1. Shared communication channel, revisited. See|Figure 15.2/for an 
example of the application of VCG to choosing a feasible winning set of maximum 
total value. 


i Bandwith 
— requirement w Value X 
(public) (private) i 
p Winners Payments 
1 0.4 1 
1 2.1-2=0.1 
2 0.5 2 
2 2.1-1=1.1 
3 0.8 2.1 
Input to EAA Output of 
VCG auction capacity 1 VCG auction 


communication 
channel 


FIGURE 15.2. This figure illustrates the execution of the VCG algorithm on 
(a shared communication channel) when C = 1 and there 
are three bidders with the given values and weights. In this example, £ = 
{{1, 2}, {1}, {2}, {3}, 0}. With the given values, the winning set selected is 
{1,2}. To compute, for example, bidder 1’s payment, we observe that without 
bidder 1, the winning set is {3} for a value of 2.1. Therefore the loss of value 
to other bidders due to bidder 1’s presence is 2.1 — 2. 


15.4.2. Spanning tree auctions. Netflix wishes to set up a distribution net- 
work for its streaming video. The links that can be used for streaming form a graph, 
and each link is owned by a different service provider. Netflix must purchase the 
use of a set of links that will enable it to reach all nodes in the graph. For each 
link Z, the owner incurs a private cost ce € [0, C] for transmitting the Netflix data. 
Netflix runs an auction to select the spanning tree of minimum cost in the graph 
(owners of the selected links will be the “winners”). In this setting, the feasible 
sets £ are the spanning trees of the graph. 

This is a social surplus maximization problem, with ve := —ce. The VCG 
mechanism for buying a minimum spanning tree (MST) is the following: 


e Ask each bidder (link owner) £ to report his (private) cost cy. 
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FIGURE 15.3. This figure shows the outcome and payments in a spanning 
tree auction. The blue labels on edges in the left figure are the costs of the 
edges. The minimum spanning tree consists of the gray highlighted edges. 
The payments are shown in pink on the right side of the figure. 


e Choose the MST T* with respect to the reported costs. 
e Pay each winning link owner £ his threshold bid, which is C if removing 
it disconnects the graph, and otherwise 


(cost of MST with @ deleted) — (cost of MST with £ contracted), 


which is always at most C. (This is the minimum cost the owner of £ can 
report and still be part of the MST.) Contracting an edge (i, j) consists 
of identifying its endpoints, thereby creating a new merged node ij, and 
replacing all edges (i, k) and (j,k), k Æ i,j, with an edge (ij, k). 


See|Figure 15.3|for an example. 


In fact, the VCG auction for this example can be implemented as a “descending 
auction”. It works as follows: Initialize the asking price for each link to C. 


e Reduce the asking price on each link uniformly (e.g., at time t, the asking 
price is C — t) until some link owner declares the price unacceptable and 
withdraws from the auction. 

e If at this point, there is a link £ whose removal would disconnect the graph, 
buy it at the current price and contract £. 


See|Figure 15.4]for a sample execution of the descending auction. 


15.4.3. Public project. Suppose that the government is trying to decide 
whether or not to build a new library which will cost C dollars. Each person in 
society has his own value v; for this library, that is, how much it is worth to that 
person to have the library built. A possible goal for the government is to make sure 
that the library is built if the population’s total value for it is at least C dollars, 
and not otherwise. How can the government incentivize the members of society to 
truthfully report their values? 
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2 


Price 2 Price 0.6 


df 


Price 0.4 Price 0.4 


Price 0.4 


FIGURE 15.4. This figure shows the evolution of the descending auction for 
purchasing a spanning tree on the example shown in In this 
example C > 2. When the price reaches 2 (upper left), edge (a,b) is ready to 
drop out. The consecutive figures (top to bottom, and within each row, left 
to right) show the points at which edges are selected and the corresponding 
payments. 


The social surplus in this setting is 0 if the library isn’t built and `; v; — C if 
the library is built. We apply VCG to maximize social surplus: 
e Ask each person 7 to report his value v; for the library. 
e Build the library if and only if X`; vi > C. 
e If the library is built, person 7 pays his threshold bid: If $- jivi Z C, he 
pays nothing. Otherwise, he pays 


pi =C — 5 vj. 
At 
In practice, this scheme is not implemented as it suffers from several problems. 
e The government might not recover its cost. For example, if there are n 
people and each has value C/(n — 1), then all payments will be 0, but 
the library will be built. Unfortunately, this is inevitable; no truthful 
mechanism can simultaneously balance the budget and maximize social 
surplus. See the chapter notes. 
e It is deeply susceptible to collusion. If two people report that their values 
are C, then the library will be built and nobody will pay anything. 
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e Our technical definition of “value” is the amount a person is willing to 
pay for an item. It is not appropriate in this example. Indeed, the library 
may be more valuable to someone who cannot pay for it than to someone 
who can (e.g., has the resources to buy books). In this case, there is 
discord between the intuitive meaning of social surplus and the technical 
definition. 


15.4.3.1. VCG might not be envy-free. 


DEFINITION 15.4.1. We say that a truthful mechanism in a win/lose setting is 
envy-free if, when bidding truthfully, each bidder prefers his own allocation and 
payment to that of any other bidder. That is, for every v and i, we have 


ailv]v; — pilv] > aj[v]vi — pj). 


EXAMPLE 15.4.2. Suppose that the government is trying to decide whether or 
not to build a bridge from the mainland to a big island A or to a small island B. 
The cost of building a bridge is $90 million. Island A has a million people, and each 
one values the bridge at $100. Island B has five billionaires, and each one values 
the bridge at $30 million. Running VCG will result in a bridge to island B and 
payments of 0. In this case, the people on island A will envy the outcome for the 
people on island B. 


le mesothelioma 


Ads retated to mesothelioma © Ads ¢ 
Mesothe ion -Eligi laim? Mesothelioma 
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By Anna Kaplan, M.D. Mesothelioma Attomeys 
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The Tap Ten Mesothelioma Law Fims 

Mesothelioma - Overview of Malignant Mesothelioma Cancer We Make It Easy To Choose. 

A www. asbestos com/mesothelloma/ y 
FIGURE 15.5. A typical page of search results: some organic and some spon- 
sored. If you click on a sponsored search result, the associated entity pays the 
search engine some amount of money. Some of the most expensive keywords 
relate to lawyers, insurance, and mortgages, and have a cost per click upwards 
of $40. Clicks on sponsored search slots for other keywords can go for as low 


as a penny. 
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(Taylor Swift La) 
Advertisers Advertisement 
ii 
Bidder 1 b, = $0.2 
Slot 1 


—_ iN p, = $0.7 


Bidder 2 ft 5, = $11 Bidder 2 
Instantaneous 
Auction 
Bidder 3 ft ea Slot 2 


—_> iN p, = $0.5 
Bidder 3 


Bidder 4 iN fs eet se 


bids 


FIGURE 15.6. Behind the scenes when you do a query for a keyword, like 
“Taylor Swift”, in a search engine: At that moment, some of the advertisers 
who have previously bid on this keyword participate in an instantaneous auc- 
tion that determines which sponsored search slot, if any, they are allocated to 
and how much they will have to pay the search engine in the event of a user 
click. Notice that they may or may not bid truthfully. In this example, the 
highest bidder’s ad (in this case, bidder 2) is allocated to the top slot, and the 
price pı he is charged per click is the bid of the second-highest bidder (bidder 
3). 


15.5. Sponsored search auctions, GSP, and VCG 


What most people don’t realize is that all that money comes in pennies at a 
timef}] Hal Varian, Google chief economist 


EXAMPLE 15.5.1. Sponsored search auctions: An individual performing a 
search, say for the term “Hawaii timeshare”, in a search engine receives a page of 
results containing the links the search engine has deemed relevant to the search, 
together with sponsored links, i.e., advertisements. These links might lead to the 
webpages of hotels and companies selling timeshares in Hawaii. To have their ads 
shown in these slots, these companies participate in an instantaneous auction. 

In this auction, each interested advertiser report] a bid b; representing the 
maximum he is willing to pay when a searcher clicks on his ad. The search engine, 
based on these bids, decides which ad to place in each slot and what price to charge 
the associated advertiser in the event of a user click. 


5 Google’s revenue in 2015 was approximately $74,500,000,000. 
6 Usually this bid is submitted in advance. 
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Suppose there are k ad slotq"| with publicly known clickthrough rates cı > 
C2 > ++: > Ck > 0. The clickthrough rate of a slot is the probability that a user 
viewing the webpage will click on an ad in that slot. If bidder i has value v; per 
click, then the expected value he obtains from having his ad assigned to slot j is 
vicj. In this setting, the social surplus of the allocation which assigns slot j to 
bidder 77; is ae Ur;Cj- 

This is not formally a win/lose auction (because of the clickthrough rates), but 
the VCG mechanism readily extends to this case: The social surplus maximizing 
allocation is selected, and the price a bidder pays is the externality his presence 
imposes on others. Specifically: 


e Each bidder is asked to submit a bid b; representing the maximum he is 
willing to pay per click. 

e The bidders are reordered so that their bids satisfy bı > b2 >..., and slot 
i is allocated to bidder i for 1 < i < k. 

e The participation of bidder i pushes each bidder j > i from slot 7 — 1 to 
slot j (with the convention that ck+1 = 0). Thus, it’s participation imposes 
an expected cost of bj(cj—1 — cj) on bidder j in one search (assuming that 
b; is 7’s value for a click). The auctioneer then charges bidder i a price of 
pi(b) per click, chosen so that his expected payment in one search equals 
the total externality he imposes on other bidders; i.e., 


k+1 


epi(b) = X. bj(cj-1 — 6). (15.3) 


j=i+1 
In other words, bidder it’s payment per click is then 


k+1 


Cy-1 — Cj 
pi(b) = X — (15.4) 
j=i+1 4 


Figure 15.6| shows the timing of events in a sponsored search auction. 
shows the allocation, payments, and advertiser utilities that result from 
maximizing social surplus for a two-slot example, and [Figure 15.8| shows the allo- 
cation and payments as a function of a particular bidder’s bid. 


15.5.1. Another view of the VCG auction for sponsored search. First 
suppose that all clickthrough rates are equal; e.g., ci = 1, for 1 < i < k. Then we 
have a simple k-item Vickrey auction (the items are slots) where bidder i’s value 
for winning slot 7 is v; and the threshold bid for each bidder is the kt" highest of 
the other bids. Thus, the payment of each winning bidder is the (k + 1)** highest 
bid bg41 and the utility of winning bidder 7 is v; — b,41. Since the payment of a 
winner doesn’t depend on his bid, the auction is also truthful. Also, with truthful 
bidding the auction is envy-free: A winner would not want to lose, and vice versa. 

In the general case, the VCG auction A can be thought of as a “sum” of k 
different auctions, where the @” auction Ag, for 1 < £ < k, is an -unit auction 
and the values and payments are scaled by (ce — ce+1). Thus, in Ag, the value to 
bidder i of winning is (ce — ce+1)v;, his payment is (ce — ce41)be41, and his utility 
is (ce = Ce41) (vi = be+1). 


T The model we consider here greatly simplifies reality. 
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Advertisers ais 
g 2 Slots 


Slot 1 expected payment 


b,=7 : 
Bidder 1 N — CTR c,p,=6°1+1+0.5-6*0.5 
PPC p, =3.5 
c,=1 P, 
Bidder 1 | expected utility = c,(v,- p,) 


=1- (7-3.5) 


Bidder 2 
Slot 2 expected payment 


c,p,=7°1+1°0.5-7°1 
CTR = 
PPC p,=1 
b,=1 ¢= 0-5) Bidder 2 expected utility = c,(v,- p,) 


š — 
padera A =0.5° (6-1) 


FIGURE 15.7. VCG on sponsored search example: An advertiser’s expected 
value for a slot is her value per click times the clickthrough rate of the slot. 
For example, bidder 2’s expected value for slot 1 is 6, and her expected value 
for slot 2 is 6-0.5 = 3. Her expected payment is the value other players obtain 
if she wasn’t there (7 -1+ 1-0.5) (since bidder 3 would get the second slot in 
her absence) minus the value the other players get when she is present (7 - 1). 
Her expected payment is the price-per-click (PPC) times the clickthrough rate. 


clickthrough rate 
expected payment in allocated slot 
= b,(c,-c,) ate b,(c,-c,) F b,c, 


) 


FIGURE 15.8. Suppose that in an ad auction there are 4 bidders, and 3 slots, 
with clickthrough rates cı > c2 > c3. The first three have submitted bids 
bı > b2 > bs. The figure shows the allocation and payment for the fourth 
bidder as a function of his bid b. The piecewise linear black curve shows the 
probability that the fourth bidder gets a click as a function of his bid. The 
blue area is his expected VCG payment when he bids b > bı. The payment 
per click for this bidder is the blue area divided by the click through rate of 
the slot he is assigned to, in this case c1. His expected utility for this outcome 
is the pink area. As indicated below the graph, if he bids between b;+1 and 
bi, he gets slot i+ 1. If he bids below b3, he gets no slot. 
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If bidder i submits the j'® highest bid b; to the VCG auction A and wins slot 
j, the expected value he will obtain is 
k 
Cjvi = Soc — Ce41) Vi; 
e=j 
where cx41 = 0. The right-hand side is also the sum of the values he obtains from 
bidding b; in each of the auctions A1, ..., Ak, assuming that the other bidders bid 
as they did in A; in this case, he will win in A,,..., Ag. 
Similarly, his expected payment in A is the sum of the corresponding payments 
in these auctions; that is, 
k 
cjpj(b) = X (e — Ce41)be+1, 
(=i 
where bz is the £? largest bid. 


LEMMA 15.5.2. The VCG auction for sponsored search auctions is truthful. 
Moreover, if bidder i bids truthfully, then he does not envy any other bidder j; i.e., 


ci(vi = pi) > cj(vi — py) 
(We take cj = pj =0 if j > k.) 


PROOF. The utility of a bidder 7 in the sponsored search auction A is the sum 
of his utilities in the k auctions A1,..., Ap. Since bidding truthfully maximizes 
his utility in each auction separately, it also maximizes his utility in the combined 
auction. Similarly, since he is not envious of any other bidder 7 in any of the 
auctions A;,...,A,, he is not envious in the combined auction. 


15.5.2. Generalized second-price mechanism. Search engines and other 
Internet companies run millions of auctions every second. Some companies use 
VCG, e.g., Facebook, but there is another format that is more popular, known as 
generalized second-price (GSP) mechanism, so named because it generalizes 
the Vickrey second-price auction. 

A simplified version of the GSP auction works as follows: 


e Each advertiser interested in bidding on keyword K submits a bid b;, 
indicating the price he is willing to pay per click. 

e Each ad is ranked according to its bid b; and ads allocated to slots in this 
order. 

e Each winning advertiser pays the minimum bid needed to win the allo- 
cated slot. For example, if the advertisers are indexed according to the 
slot they are assigned to, with advertiser 1 assigned to the highest slot 
(slot 1), then advertiser i’s payment p; is 


pi = bi+1- 
(Without loss of generality, there are more advertisers than slots. If not, 
add dummy advertisers with bids of value 0.) 


When there is only one slot, GSP is the same as a second-price auction. How- 
ever, when there is more than one slot, GSP is no longer truthful, as Figure 
shows. 
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FIGURE 15.9. The top of the figure shows an execution of GSP when adver- 
tisers bid truthfully. Bidding truthfully is not an equilibrium though. For 
example, if bidder 1 reduces his bid to 5, as shown in the bottom figure, then 
he gets allocated to the second slot instead of the first, but his utility is higher. 


Although GSP is not truthful, one of its Nash equilibria precisely corresponds 
to the outcome of VCG in the following sense. 


LEMMA 15.5.3. Consider n competing advertisers with values vi sorted so that 
vı > v2 [+++ > Vp. Assuming truthful bidding in the VCG auction, from (15.4) we 
have that bidder i’s price-per-click is 


k+1 


Cj—1 — Cj 
poe = Sy. (15.5) 
j=i+1 t 


Then, in GSP, it is a Nash equilibrium for these advertisers to bid (b1,...,0n) 
where bı > pi and b; = pce fori > 2. 


REMARK 15.5.4. For an example of this bidding strategy, see|Figure 15.10 
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FIGURE 15.10. The top figure illustrates the allocation and payments using 
VCG. The bottom figure illustrates the allocation and payments in GSP when 
bidders bid so as to obtain the same allocation and payments. Bidding this 
way is a Nash equilibrium for GSP. 


PROOF. If the advertisers bid (b1, ..., bn), then the allocation and payments in 
GSP are exactly those of VCG. Moreover, if each bidder £ Æ i bids bẹ, then bidding 
bi is a best response for bidder i: If he bids b; € (bi—1,bi+1), then his utility will 
not change. Otherwise, if he bids b; and that yields him a different slot j, then it 


will be at a price pY°, which cannot increase his utility by the envy-free property 


of VCG (Lemma 15.5.2). 


15.5.2.1. Advertisers differ. In sponsored search auctions carried out by search 
engines, they take into account the fact that users prefer certain advertisers to 
others. To model this, let f; be an appeal factor for advertiser 7 that affects the 
probability that a user will click on an ad from that advertiser. Specifically, we 
assume that the clickthrough rate of advertiser 7 in slot j has the form f; cj. In 
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GSP, bidders are first reordered so that bı fı > be fo >--- > bn fn. Bidder i is then 
assigned to slot i, and he is charged a price-per-click which is the minimum bid 
required to retain his slot: 
bisa fi 
a Hf +1 
fi 


See the exercises for further details. 


15.6. Back to revenue maximization 


EXAMPLE 15.6.1. Trips to the moon: A billionaire is considering selling 
tours to the moon. The cost of building a rocket is T. There are ng people that 
have declared an interest in the trip. The billionaire wishes to set prices that will 
recover his cost but does not have good information about the distribution of values 
the people (bidders) have. Therefore he runs the following auction: Let n be the 
number of bidders willing to pay T/no. If nı = no, the auction ends with the sale 
of a ticket to each of the no bidders at price T/no. If nı < no, let ng be the number 
of bidders willing to pay T/n,. Iterating, if nj;4, = nj for some j, then the auction 
terminates with a sale to each of these n; bidders at a price of T/n;. Otherwise, 
some nj is 0 and the auction terminates with no sale. 


EXERCISE 15.b. Show that it is a dominant strategy to be truthful in the 
auction of|Example 15.6.1| Also, show that if the bidders are truthful, the auction 
finds the largest set of bidders that can share the target cost T equally, if there is 
one. 


15.6.1. Revenue maximization without priors. In the previous chapter, 
we assumed that the auctioneer knew the prior distributions from which bidders’ 
values are drawn. In this section, we present an auction that is guaranteed to 
achieve high revenue without any prior information about the bidders. We do this 
in the context of a digital goods auction. These are auctiong§| to sell digital goods 
such as mp3s, digital video, pay-per view TV, etc. A characteristic feature of 
digital goods is that the cost of reproducing the items is negligible and therefore 
the auctioneer effectively has an unlimited supply of the items. 

For digital goods auctions, the VCG mechanism allocates to all of the bidders 
and charges them all nothing! Thus, while VCG perfectly maximizes social surplus, 
it can be disastrous when the goal is to maximize revenue. We present a truthful 
auction that does much better. 


DEFINITION 15.6.2. The optimal fixed-price revenue that can be obtained 
from bidders with bid vector b = (b1, b2,..., bn) is 


R*(b) = max p-|{i : bi > p}. (15.6) 
p 
Let p*(b) denote the (smallest) optimal fixed price, i.e., the smallest p where 
the maximum in[|(15.6)|is attained. 


Equivalently, R*(b) can be defined as follows: Reorder the bidders so that 
by > bg > +++ > bn. Then 


R* (b) = radie k + bp. (15.7) 


8 Economists call this the “monopoly pricing problem with constant marginal cost”. 
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If the auctioneer knew the true values v = (v1,..., Un) of the agents, he could easily 
obtain a revenue of R*(v) by setting the price to p*(v). But the auction which 
asks the agents to submit bids and then sets the price to p*(b) is not truthful. In 
fact, no auction can approximate R*(v) for all v. 


CLAIM 15.6.3. Fix 6 > 0. No (randomized) auction can guarantee expected 
revenue which is at least 6-R*(v) for all v. 


PROOF. Let n = 1, and suppose that V; is drawn from the distribution F(x) = 


1-1/2, for x > 1. Then by|Theorem 14.9.3} no auction can obtain revenue more 


than 1. On the other hand, E [R*(Vi)] = co. 


The difficulty in the claim arose from a single high bidder. This motivates the 
goal of designing an auction that achieves a constant fraction of the revenue 


R3(b) := mot k - by. 


Since there are no value distributions assumed, we seek to maximize revenue in 
mechanisms that admit dominant strategies. By a version of the Revelation Prin- 
ciple (see Notes of |Chapter 14), it suffices to consider truthful mechanisms. To 
obtain this, we can ensure that each agent is offered a price which does not depend 
on her own bid. The following auction is a natural candidate. 


The deterministic optimal price auction (DOP): 

For each bidder i, compute t; = p*(b_;), the optimal fixed 
price for the remaining bidders, and use that as the threshold 
bid for bidder i. 


Unfortunately, this auction does not work well, as the following example shows. 


EXAMPLE 15.6.4. Consider a group of bidders of which 11 bidders have value 
100 and 1,001 bidders have value 1. The best fixed price is 100 - at that price 11 
items can be sold for a total revenue of $1,100. (The only plausible alternative is 
to sell to all 1,012 bidders at price $1, which would result in a lower revenue.) 

However, if we run the DOP auction on this bid vector, then for each bidder 
of value 100, the threshold price that will be used is $1, whereas for each bidder of 
value 1, the threshold price is $100, for a total revenue of only $11! 


In fact, the DOP auction can obtain arbitrarily poor revenue compared to 
R3(v). The key to overcoming this problem is to use randomization. 


15.6.2. Revenue extraction. A key ingredient in the auction we will develop 
is the notion of a revenue extractor (discussed in the context of trips to the moon, 


Example 15.6.1). 


DEFINITION 15.6.5 (A revenue extractor). The revenue extractor pep(b) 
with target revenue T sells to the largest set of k bidders that can equally share 
the cost T and charges each T/k. If there is no such set, the revenue is $0. 


Using the ascending auction procedure discussed in|Example 15.6.1, we obtain 


the following: 


LEMMA 15.6.6. The revenue extractor pep is truthful and guarantees a revenue 
of T on any b such that R*(b) > T. 


See|Exercise 15.8]for an alternative implementation and proof. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


282 15. TRUTHFUL AUCTIONS IN WIN/LOSE SETTINGS 


15.6.3. An approximately optimal auction. 


DEFINITION 15.6.7 (RSRE). The random sampling revenue extraction 
auction (RSRE) works as follows: 
(1) Randomly partition the bids b into two groups by flipping a fair coin for 
each bidder and assigning her bid to b’ or b”. 
(2) Compute the optimal fixed-price revenue T” := R*(b’) and T” := R*(b”). 
(3) Run the revenue extractors: per, on b” and per, on b’. Thus, the target 
revenue for b” is determined by b’ and vice versa. 


OF P(b’) = 30 
b" = (20, 10, 5,5, 5,5, 1) 


pe3o(b') 


pei¢(b") 


ù = (10,8, 5,3) 
OFP(b') = 16 


FIGURE 15.11. This figure illustrates a possible execution of the RSRE auc- 
tion when the entire set of bids is (20,10, 10, 8, 5, 5, 5, 5,5,3,1). Running the 
revenue extractor pes 9(b’) will not sell to anyone. Running the revenue ex- 
tractor pe,,(b”) will sell to the top six bidders at a price of 16/6. 


REMARK 15.6.8. It seems more natural to compute the price p*(b’) and use 
that as a threshold bid for the bidders corresponding to b” and vice versa. The 
analysis of this auction, known as the Random Sampling Optimal Price (RSOP) 
auction, is more delicate. See the notes. 


THEOREM 15.6.9. The random sampling revenue extraction (RSRE) auction is 
truthful and for all bid vectors b, the expected revenue of RSRE is at least R3(b) /4. 
Thus, if bidders are truthful, this auction extracts at least R5(v)/4 in expectation. 


Proor. The RSRE auction is truthful since it is simply randomizing over 
truthful auctions, one for each possible partition of the bids. (Note that any target 
revenue used in step (3) of the auction is independent of the bids to which it 
is applied.) So we only have to lower bound the revenue obtained by RSRE on 
each input b. The crucial observation is that for any particular partition of the 
bids, the revenue of RSRE is at least min(T’, T”). Indeed, if, say, T” < T”, then 
R3(b”) = T” is large enough to ensure that per: (b”) will extract a revenue of T”. 

Thus, we just need to analyze E(min(T’,T”’)). Assume that R5(b) = kp* has 
k > 2 winners at price p*. Of these k winners, suppose that k’ are in b’ and k” are 
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in b”. Thus, T’ > k’p* and T” > k”p*. Therefore, 
z [min(T", T”)] s E [min(k'p*, k”p*)] _ E [min(k’, k”)] = 1 
kp* 7 kp* 7 k T4 
by |Claim 15.6.10 
CLAIM 15.6.10. If k > 2 and X ~ Bin(k, 4), then E [min(X, k — X)] > k/4. 


PROOF. For k = 2,3, it is easy to check that the claim holds with equality. 
Suppose k > 4. Then min(X, k — X) = § — |X — $| and, e.g., by|Appendix C| (12) 
k _1 

E[|X — k/2|] < V VarX = T =a 


REMARK 15.6.11. Notice that if the bidders actually had values i.i.d. from a 
distribution F, then the optimal auction would be to offer each bidder the price p 
that maximizes p(1—F'(p)). Thus, the optimal auction would in fact be a fixed-price 
auction. 


Notes 


The VCG mechanism is named for Vickrey [Vic61], Clarke , and Groves [Gro79]. 
The most general version of VCG we present in this book is in[Chapter 16] See the chapter 
notes there for more on VCG, the related theory, and its applications. In this chapter, we 
have focused on truthful single-parameter mechanism design, wherein each bidder’s value 
for an allocation depends on only a single private real parameter vi; e.g., the bidder has 
value v; for winning and value 0 for losing. 

The descending auction for buying a minimum spanning tree is due to Bikhchandani 
et al. [BdVSV11). In[§15.4.3} we observed that for public projects, the VCG mechanism 
does not achieve budget balance. That is, the mechanism designer (the government) did 
not necessarily recover the cost of building the library. Green and Laffont showed 
that no truthful mechanism can simultaneously balance the budget and maximize social 


surplus. 


FIGURE 15.12. Envy 


Note that |Definition 15.4.1| doesn’t capture all types of envy; e.g., the possibility of 


one agent envying another agent’s value function is not addressed. For instance, consider 
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two diners in a restaurant. One orders the $100 lobster. The second cannot afford that 
and orders rice and beans for $10. By our formal definition, this situation is envy-free, 
but clearly it is not. 

The model and results on sponsored search auctions are due to Edelman, Ostrovsky, 
and Schwarz [EOSO07], and Varian [Var07]. Sponsored search auctions date back to the 
early 1990s: The idea of allowing advertisers to bid for keywords and charging them 
per click was introduced by Overture (then GoTo) in 1994. In this early incarnation, 
the mechanism was “first-price”: An advertiser whose ad was clicked on paid the search 
engine his bid. In the late 1990s, Yahoo! and MSN implemented the Overture scheme as 
well. However, the use of first-price auctions was observed to be unstable, with advertisers 
needing to constantly update their bids to avoid paying more than their value. In 2002, 
Google adopted the generalized second price mechanism. Yahoo!, MSN, and Microsoft 
followed suit. For a brief history of sponsored search as well as results and research 
directions related to sponsored search, see [JMO08] and Chapter 28 in [Nis07]. 

Facebook is one of the only companies selling ads online via auctions that uses the 
VCG mechanism [Met15]. Possible reasons that VCG isn’t widely used are (a) it is 
relatively complicated for the advertisers to understand and (b) optimizing social surplus 
may not be the objective. The obvious goal for the search engines is to maximize long- 
term profit. However, it is not clear what function of the current auction is the best proxy 
for that. Besides profit and social surplus, other key parameters are user and advertiser 
satisfaction. 

In the previous chapter, we assumed the auctioneer knew the prior distributions from 
which agents’ values are drawn. An attractive alternative is to design an auction whose 
performance does not depend on knowledge of priors. Indeed, where do priors come from? 
Typically, they come from previous similar interactions. This is problematic when markets 
are small, when interactions are novel or when priors change over time. Another difficulty 
is that agents may alter their actions to bias the auctioneer’s beliefs about future priors 
in their favor. This is the motivation for the material in 815.6] 

The first prior-free digital goods auctions and the notion of prior-free analysis of 
auctions were introduced by Goldberg, Hartline, and Wright [GHW01]. Their paper shows 
that any deterministic truthful auction that treats the bidders symmetrically will fail to 
consistently obtain a constant fraction of the optimal fixed-price revenue, thus motivating 
the need for randomization. (Aggarwal et al. showed that this barrier can be 
circumvented using an auction that treats the bidders asymmetrically.) 

The RSOP auction (see proposed by was first analyzed 
in [GHK * 06}; its analysis was improved in a series of papers culminating in [AMS09]. 
The random sampling revenue extraction auction presented here is due to Fiat, Goldberg, 
Hartline, and Karlin [FGHK02]. The key building block, revenue extraction, is due to 
Moulin and Shenker [M501]. 

Stronger positive results and further applications of the prior-free framework for rev- 
enue maximization and cost minimization are surveyed in Chapter 13 of and in 
Chapter 7 of [Harl2]. The strongest positive Ce known for prior-free digital goods 
auctions are due to Chen, Gravin, and Lu . They showed that the lower bound of 
2.42 on the competitive ratio of any digital me auction, proved in [GHKS04], is tight. 


The problem considered in aan 15.1] was studied in [MS83}. Exercise 15.3} 15.3 Exercise 15.3} is 
from [Nis99], [Exercise 15.2]is from [NRO1], and ae 15.10] is the analogue of |Theo- 
rem 14.6.1} which is due to |Mye81}. See also |GL77]. 


Exercises 


15.1. A seller has an item that he values at vs € [0,1] and a buyer has a value 
v, € [0,1] for this same item. Consider designing a mechanism for deter- 
mining whether or not the seller should transfer the item to the buyer. 
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Social surplus is maximized by transferring the item if v, > vs and not 
transferring it otherwise. How do the VCG payments depend on the values 
Us and vp? 


15.2. Consider a communication network, where each link is owned by a different 
company (henceforth, bidder). Each bidder’s private information is the cost 
of routing data along that link. A procurement auction to buy the use of 
a path between two specific nodes s and t is run: Each company submits 
a bid representing the minimum price that company is willing to be paid 
for the use of its link. The auction consists of choosing the minimum cost 
path between s and ¢ and then paying each bidder (link owner) along that 
path the maximum amount he could have bid while still being part of the 
minimum cost path. 

e Show that this auction is truthful. 

e Construct an example in which the amount paid by the auctioneer is 
Q(n) times as large as the actual cost of the shortest path. Here n is 
the number of links in the network. 


15.3. There are n computers connected in a line (with computer i, with 1 < i < n, 
connected to computers i — 1 and i+ 1). Each computer has private value 
v; for executing a job; however, a computer can only successfully execute 
the job with access to both of its neighboring links. Thus, no two adjacent 
computers can execute jobs. Consider the following protocol: 

e In the first, left-to-right, phase, each computer places a bid r; for 
the link on its right, where rı = vı and r; = max(v; — ri—1,0) for 
L<i<n. 

e In the next, right-to-left, phase, each computer places a bid £; for the 
link on its left, where @, =v, and 4; = max(v; — €;41,0). 

Computer 7 can execute its task if and only if 4; > ri—1 and r; > 4i+ı. In 
this case, it “wins” both links, and its payment is ri—1 + 41. 

(a) Show that this mechanism maximizes social surplus; that is, the set 
of computers selected has the maximum total value among all subsets 
that do not contain any adjacent pairs. 

(b) Show that, under the assumption that any misreports by the comput- 
ers are consistent with having a single value v/, it is in each computer’s 
best interest to report truthfully. 

(c) Show that if the computer can misreport arbitrarily in the two phases, 
then it is no longer always a dominant strategy to report truthfully; 
however, reporting truthfully is a Nash equilibrium. 


15.4. | Consider the descending price auction described in[§15.4.2| Prove that the 
outcome and payments are the same as those of the VCG auction and that 
it is a dominant strategy for each edge (bidder) to stay in the auction as 
long as the current price is above his cost. 
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15.5. | Consider a search engine selling advertising slots on one of its pages. There 
are three advertising slots with publicly known clickthrough rates (proba- 
bility that an individual viewing the webpage will click on the ad) of 0.08, 
0.03, and 0.01, respectively, and four advertisers whose values per click 
are 10, 8, 2, and 1, respectively. Assume that the expected value for an 
advertiser to have his ad shown in a particular slot is his value times the 
clickthrough rate. What is the allocation and what are the payments if the 
search engine runs VCG? How about GSP? 


15.6. | Consider the following model for a keyword auction, slightly more general 
than the version considered in the text. There are k slots and n bidders. 
Each bidder has a private value v; per click and a publicly known quality 
qi- The quality measures the ad-bidder (ad) dependent probability of being 
clicked on. Assume that the slots have clickthrough rates c1 > co > ++: > Ck 
and that the expected value to bidder 7 to be shown in slot j is qcju;. De- 
termine the VCG allocation and per click payments for this setting. (Note 
that the expected utility of bidder i if allocated slot 7 at price-per-click of 


Pij is qicj (vi — pij ).) 


15.7. Consider the GSP auction in the setting of the previous exercise, where 
bidders are assigned to slots in decreasing order of b;q;, where b; is the ad- 
vertiser i’s bid, with a bidder’s payment being the minimum bid he could 


make to retain his slot. Prove an analogue of/Lemma 15.5.3]for this setting. 


15.8. Consider the following implementation of the profit extractor pe; from 
Given the reported bids, for each bidder i separately, the auction- 
eer pretends that bidder’s bid is co and then, using bid vector (oo, b_;), 
determines the largest k such that k bidders can pay T/k. This price is 
then offered to bidder i who accepts if b; > T/k and rejects otherwise. 
With this formulation it is clear that the auction is truthful. However, it 
is less obvious that this implements the same outcome. Show that it does. 


15.9. Prove the exact formula for RSRE 


E(min(k', k") = Y min(i,k i) (1 )2-4 


0<i<k 


-e(z (g) 


15.10. Let A be an auction for a win/lose setting defined by U and £, and suppose 
that a;[b] is the probability that bidder i wins when the bids are b = 
(bi,...,bn). (This expectation is taken solely over the randomness in the 
auction.) Assume that, for each bidder, v; € [0,00). Prove that it is a 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


EXERCISES 287 


dominant strategy in A for bidder į to bid truthfully if and only if, for any 

bids b_; of the other bidders, the following holds: 

(a) The expected allocation a;[v;,b_;] is (weakly) increasing in v;. 

(b) The expected payment of bidder i is determined by the expected al- 
location up to an additive constant: 


pilvi, b_j] = U4° avi, bi] _ | Qi [z, b_,|dz + pi (0, b_,]. 
0 
Hint: The proof is analogous to that of/Theorem 14.6.1 


15.11. Show that the allocation and payment rule of the VCG mechanism for 
maximizing social surplus in win/lose settings satisfies conditions (a) and 
(b) of the previous exercise. 


15.12. Generalize the result of|Exercise 15.10|to the following setting: 


e Each bidder has a private value v; > 0. 
e Each outcome is a vector q = (q1,---;Gn), where q; is the “quantity” 
allocated to bidder i. (In a win/lose setting, each q; is either 0 or 
1.) Denote by Q the set of feasible outcomes (generalizing £ from 
win/lose settings). 
e A bidder with value v; who receives an allocation of q; and is charged 
pi obtains a utility of viqi — pi. 
Now let A be an auction for an allocation problem defined by U and Q, 
and suppose that a;[b] is the expected quantity allocated to bidder i when 
the bids are b = (b1,...,b,). (This expectation is taken solely over the 
randomness in the auction.) Assume that, for each bidder, v; € [0, 00). 
Show that it is a dominant strategy in A for bidder i to bid truthfully if 
and only if, for any bids b_; of the other bidders, the following holds: 
(a) The expected allocation a;[v;,b_;] is (weakly) increasing in v. 
(b) The expected payment of bidder i is determined by the expected al- 
location up to an additive constant: 


pilvi, b_j] = Vi ' Qi [vi, b_j] == J Qi [z, b_,|dz + Pi (0, b-i]. 
0 


15.13. Consider a single-item auction, but with the following alternative bidder 
model. Suppose that each of n bidders has a signal, say s; for bidder i, 
and suppose that agent 7’s value v; := v;(s1,..., Sn) is a function of all the 
signals. This captures scenarios where each bidder has different information 
about the item being auctioned and weighs all of these signals, each in his 
or her own way. 

Suppose that there are two bidders, where v1 (s1, $2) = sı and v2(s1, $2) = 
s?. Assume that sı € [0,00). Show that there is no truthful social surplus 
maximizing auction for this example. (A social surplus maximizing auction 
must allocate the item to the bidder with the higher value, so to bidder 2 
when sı > 1 and to bidder 1 when sı < 1.) 
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VCG and scoring rules 


In the previous chapter we studied the design of truthful auctions in a setting 
where each bidder was either a winner or a loser in the auction. In this chapter, 
we explore the more general setting of mechanism design. Mechanisms allow 
for the space of outcomes to be richer than, say, simply allocating items to bidders. 
The goal of the mechanism designer is to design the game (i.e., the mechanism) so 
that, in equilibrium, desirable outcomes are achieved. 


16.1. Examples 


EXAMPLE 16.1.1 (Spectrum auctions). In a spectrum auction, the govern- 
ment allocates licenses for the use of some band of electromagnetic spectrum in a 
certain geographic area. The participants in the auction are cell phone companies 
who need such licenses to operate. Each company has a value for each combination 
of licenses. The government wishes to design a procedure for allocating and pricing 
the licenses that maximizes the cumulative value of the outcome to all participants. 
What procedure should be used? 


FIGURE 16.1. A spectrum auction. 


EXAMPLE 16.1.2 (Building roads). The state is trying to determine which 
roads to build to connect a new city C to cities A and B (which already have a 
road between them). The options are to build a road from A to C or a road from 
B to C, both roads, or neither. Each road will cost the state $10 million to build. 
Each city obtains a certain economic/social benefit for each outcome. For example, 
city A might obtain a $5 million benefit from the creation of a road to city C, 
but no real benefit from the creation of a road between B and C. City C, on the 
other hand, currently disconnected from the others, obtains a significant benefit 


288 
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($9 million) from the creation of each road, but the marginal benefit of adding a 
second connection is not as great as the benefit of creating a first connection. The 
following table summarizes these values (in millions of dollars), and the cost to the 
state for each option. The final row is the social surplus, the sum of the values to 
the three cities plus the value to the state. 


road A-C road B-C both none 
city A 5 0 5 0 
city B 0 5 5 0 
city C 9 9 15 0 
state -10 -10 -20 0 
social surplus 4 4 5 0 


The state’s goal might be to choose the option that maximizes social surplus, 
which, for these numbers, is the creation of both roads. However, these numbers 
are reported to the state by the cities themselves, who may have an incentive to 
exaggerate their values, so that their preferred option will be selected. Thus, the 
state would like to employ a mechanism that incentivizes truth-telling. 


16.2. Social surplus maximization and the general VCG mechanism 


A mechanism M selects an outcome from a set of possible outcomes A, based 
on inputs from a set of agents. Each agent i has a valuation function v; : A > R 
that maps the possible outcomes A to nonnegative real numbers. The quantity 
v;(a) represents the valuq!|that i assigns to outcome a € A, measured in a common 
currency, such as dollars. We denote by V(a) the value the (mechanism) designer 
has for outcome a. Given the reported valuation functions, the mechanism selects 
an outcome and a set of payments, one per agent. We have seen the following 
definition several times; we repeat it in the context of this more general setting. 


DEFINITION 16.2.1. We say that a mechanism M is truthful if, for each agent 
i, each valuation function v;(-), and each possible report b_; of the other agents, it 
is a dominant strategy for agent 7 to report his valuation truthfully. Formally, for 
all b_i, all 1, vil), and b;(-), 

uz[vi, b_;|v,] > uilbi, b_x|vi], 

where it’s utility] wi[b:, b_i|vi] = u;(a(b)) — p;(b). Here b = (b1(-), ba(-),---, On(-)), 
a(b) € A is the outcome selected by M on input b, and p;(b) is the payment of 
agent i. 

DEFINITION 16.2.2. The social surplus of an outcome a is $; u;(a) + V (a). 
The reported social surplus of an outcome a is )>; b;(a) + V (a). 


l See discussion in §15.4.3 


2 This is called quasilinear utility and is applicable when the value is measured in the same 
units as the payments. 
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a* =argmax ð b;(a 
acA 3 ( ) 


pi(b) = — $ | b;(a*) + hi(b-:) 


j+i 


FIGURE 16.2. A depiction of the VCG mechanism for the setting where 
V(a) = 0 for all a € A. The outcome selected is a* (b), and the payment 
of agent i is pi(b). Note that Theorem [16.2.6|holds for any choice of functions 
hi(b_;) as long as it is independent of b;(-). In Definition we take 
hi(b_i) = maxa >), 4; 6;(a). This choice guarantees that w;[vi, b-i|vi] > 0 for 
all v;, so long as all v;(a) > 0 for all a and i. 


DEFINITION 16.2.3. The Vickrey-Clarke-Groves (VCG) mechanism, il- 
lustrated in[Figure 16.2] works as follows: Each agent is asked to report his valua- 
tion function v;(-) and submits a function b;(-) (which may or may not equal v;(-)). 
Write b = (b1(-),..-,0n(-)). The outcome selected is 


a* := a*(b) = argmax, Ds b;(a) + V(a) | , 


breaking ties arbitrarily. The payment p;(b) agent i makes is the loss his presence 
causes others (with respect to the reported bids); formally, 


pi(b) = max X bla) + V(a) | — | S$) ola) + V(a*) | (16.1) 


j+i j#i 


The first term is the total reported value the other agents would obtain if 7 was 
absent, and the term being subtracted is the total reported value the others obtain 
when 7 is present. 


EXAMPLE 16.2.4. Consider the outcome and payments for the VCG mechanism 


on|Example 16.1.2} assuming that the cities report truthfully. 
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road A-C road B-C both none 
City A 5 0 5 0 
City B 0 5 5 0 
City C 9 9 15 0 
state —10 —10 —20 0 
social surplus 4 4 5 0 
surplus without A —1 4 0 0 
surplus without C —5 —5 —10 0 


As we saw before, the social surplus maximizing outcome would be to build 
both roads, yielding a surplus of 5. What about the payments using VCG? For 
city A, the surplus attained by others in A’s absence is 4 (road B-C only would be 
built), whereas with city A, the surplus attained by others is 0, and therefore city 
A’s payment, the harm its presence causes others, is 4. By symmetry, B’s payment 
is the same. For city C, the surplus attained by others in C’s absence is 0, whereas 
the surplus attained by others in C’s presence is —10, and therefore C’s payment is 
10. Notice that the total payment is 18, whereas the state spends 20. 


EXAMPLE 16.2.5 (Employee housing). A university owns a number of homes 
and plans to lease them to employees. They choose the allocation and pricing by 
running a VCG auction. A set of n employees, each interested in leasing at most 
one house, participates in the auction. The it” employee has value vij for a yearly 


lease of the j*® house. |Figure 16.3/shows an example. 
—— Â YN —— AÂ 
OS a. ee A - a; : À 


1” house1 1” house1 house 1 
2: —— H 2: —— H D 


house 2 house 2 house 2 


5 5 
@—— 1 A @o——_ 1 — & O= 1 Er 
house 3 house 3 house 3 
With a,b,c Without a Without b 


FIGURE 16.3. The label on the edge from i on the left to j on the right is the 
value vi; that employee i has for a yearly lease of house j (say in thousands 
of dollars). The VCG mechanism allocates according to purple shaded edges. 
The payment of bidder a is 0 since in his absence house 3 is still allocated 
to bidder b. The payment of bidder b is 1 since in his absence the allocation 
is as follows: house 2 to bidder a and house 3 to bidder c, and therefore the 
externality he imposes is 1. 
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THEOREM 16.2.6. VCG is a truthful mechanism for maximizing social surplus. 


PROOF. We prove the theorem assuming that the designer’s value V (a) for all 


outcomes a is 0. See|Exercise 16.1}for the general case. Fix the reports b_; of all 


agents except agent i. Suppose that agent i’s true valuation function is v;(-), but 
he reports 6;(-) instead. Then the outcome is 


a* = argmax, 5 b;(a) 
J 


and his payment is 
pi(b) = max ) b;(a) IG )=C X ojla ). 
j+i j+i Ft 
where C := max, )/,4,;6;(a) is a constant that agent i’s report has no influence 
on. Thus, agent 7’s utility is 


uj(blv;) = v;(a*) — pi(b) = vi(a*) + XC bj (a*) — C. (16.2) 
j#i 
On the other hand, if he were to report his true valuation function v;(-), the outcome 


would be 
a’ = argmax, | v;(a) + >, bj(a) |, 


SFI 
and thus, by }(16.2)| his utility would be 
u,[vi, b_;|v;] = vila’) + 5 b;(a’) —C > vi(a*) + S byla") — C. 
j#i JFi 

In other words, u;[b|v;] < uilvi, b_;|v;] for every b, v;(-), and i. 

REMARK 16.2.7. It follows from the proof of [Theorem 16.2.6| that, if all the 
bids are truthful, then bidder ts utility is uj[v|vj] = maxgea 0; vila) + hi(v_i), 
which is the same, up to a constant, as the objective function optimized by the 
mechanism designer. 


The following example illustrates a few of the deficiencies of the VCG mecha- 
nism. 


EXAMPLE 16.2.8 (Spectrum auctions, revisited). Company A has recently 
entered the market and needs two licenses in order to operate efficiently enough to 
compete with the established companies. Thus, A has no value for a single license 
but values a pair of licenses at $1 billion. Companies B and C are already well 
established and only seek to expand capacity. Thus, each one needs just one license 
and values that license at $1 billion. 

Suppose the government runs a VCG auction to sell two licenses. If only 
companies A and B compete in the auction, the government revenue is $1 billion 
(either A or B can win). However, if A, B, and C all compete, then companies 
B and C will each receive a license but pay nothing. Thus, VCG revenue is not 
necessarily monotonic in participation or bidder values. 

A variant on this same setting illustrates another problem with the VCG mech- 
anism that we saw earlier: susceptibility to collusion. Suppose that company A’s 
preferences are as above and companies B and C still only need one license each, 
but now they only value a license at $25 million. In this case, if companies B and 
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C bid honestly, they lose the auction. However, if they collude and each bid $1 
billion, they both win at a price of $0. 


16.3. Scoring rules 


16.3.1. Keeping the meteorologist honest. On the morning of each day 
t= 1,2,..., a meteorologist M reports an estimate q of the probability that it will 
rain that day. With some effort, he can determine the true probability p; that it 
will rain that day, given the current atmospheric conditions, and set q = p+. Or, 
with no effort, he could just guess this probability. How can his employer motivate 
him to put in the effort and report p? 

A first idea is to pay M, at the end of the tt? day, the amount q (or some 
dollar multiple of that amount) if it rains and 1 — q if it shines. If p; = p > 4 and 
M reports truthfully, then his expected payoff is p? + (1 — p)?, while if he were to 
report q = 1, then his expected payoff would be p = p? + p(1— p), which is higher. 

Another idea is to pay M an amount that depends on how calibrated his 
forecast is. Suppose that M reports q values on a scale of b so that he has nine 
choices} namely {k/10 :kef1,..., Qt}. When a year has gone by, the days of that 
year may be divided into nine types according to the q value that the weatherman 
declared. Suppose there are nz days that the predicted value q is Ë, while in fact 
it rained on rẹ of these ną days. Then, the forecast is calibrated if r;,/nz is close 
to k/10 for all k. Thus, we might want to penalize the weatherman according to 
the squared error of his predictions; i.e., 


Unfortunately, it is easy to be calibrated without being accurate: If the true prob- 
ability of rain is 90% for half of the days each year and 10% the rest and M reports 
50% every day, then he is calibrated for the year. 


16.3.2. A solution. Suppose that the employer pays sı(q+) to M if it rains 
and s2(1— q+) if it shines on day t. If p; = p and q = q, then the expected payment 
made to M on day t is 

9p (4) := psi(q) + (1 — p)s2(1 — 4). 
The employer’s goal is to choose the functions s1, s2 : [0,1] — R so that 
9p(P) 2 9p(q) for all q € [0,1] \ {p}. (16.3) 
In this case, the pair s1(-), s2(-) is called a proper scoring rule. Suppose these 
functions are differentiable. Then we must have 
Gp(P) = psi (p) — (1 — p)s9(1 — p) = 0 
for all p. Define 


h(p) := psy(p) = (1 — p)s2(1 — p) (16.4) 

Thus 1 
, pe! aflesigiede Pp = +i7P l 
g =pl- 0-ta- (2-1) as) 


3 M’s instruments are not precise enough to yield complete certainty. 
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PROPOSITION 16.3.1. A pair of smooth? functions 81, S2 define a proper scoring 
rule if and only if there is a continuous h : (0,1) — [0,00) such that for all p 


and s5(p)= ES?) 
P P 
PROOF. If s1,s2 satisfy [(16.6)] and h(-) is nonnegative, then gp(q) is indeed 
maximized at p by |(16.5)| since p/q — (1 — p)/(1 — q) is positive for q < p and 
negative for q > p. 
Conversely, if the smooth functions s1, s2 define a proper scoring rule, then h 
defined by|(16.4)/is continuous. If there is some p for which h(p) < 0, then g,(q) > 0 
for q | p, and the scoring rule is not proper at p. 


(16.6) 


REMARK 16.3.2. If h(p) > 0 for all p, then the inequality in|(16.3)|is strict and 
the scoring rule is called strictly proper. 


For example, we can apply|Proposition 16.3.1}to obtain the following two well 
known scoring rules: 


e Letting h(p) = 1, we get the logarithmic scoring rule: 


si(p) = log p. 
e Letting h(p) = 2p(1 — p), we get the quadratic scoring rule: 


p 
sœ) =f x 2(1 — x) dx = 2p — p”. 
0 


16.3.3. A characterization of scoring rules*. We extend the previous dis- 
cussion to the case where the prediction is not binary. 


DEFINITION 16.3.3. Consider a forecaster whose job is to assign probabilities to 
n possible outcomes. Denote by A? the open simplex consisting of strictly positive 
probability vectors. A scoring rule s : AS — R” specifies the score/reward the 
forecaster will receive as a function of his predictior}] P| and the outcome. That is, if 
s(q) = (si(q),.--,Sn(q)), then s;(q) is the reward for report q = (q1,.--, Qn) if the 
itt outcome occurs. The scoring rule is proper if for all p the function 


q) = > Pisi(a) 


is maximized at q = p. Thus, a scoring rule is proper if a forecaster who believes 
the probability distribution over outcomes is p maximizes his expected reward by 
reporting this distribution truthfully. (If the maximum is attained only at p, then 
s is called a strictly proper scoring rule.) 


In order to characterize proper scoring rules, we recall the following definitiorf| 


4 A function is smooth if it has a continuous derivative. 
5 We've restricted predictions to the interior of the simplex because one of the most important 


scoring rules, the logarithmic scoring rule, becomes —oo at the boundary of the simplex. 
6 See Appendix C]for a review of basic properties of convex functions. 
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DEFINITION 16.3.4. Let K C R” be a convex set. A vector v € R” isa 
subgradient of a convex function f: K ~ Rat y€ K if foralxe K 
f(x) 2 fly) +v- x-y). 
If f extends to a differentiable, convex function on an open neighborhood of K C 
R”, then V f(y) is a subgradient at y for every y € K. 


THEOREM 16.3.5. Lets: A? > R”. Then s(-) is a proper scoring rule if and 
only if there is a conver function f : A? > R such that for all q € A? there is a 
subgradient Vq of f at q satisfying 

si(q) = f(a)+(ei-—Q)-Vq Vi (16.7) 

PROOF. Suppose that S is a proper scoring rule. Let 


f(p) :=p-s(p) = sup p-s(q). 
qeA?, 


Since f(-) is the supremum of linear functions, it is convex. Next, we fix q. Then 
for all p 
f(p) 2 p-s(a) = f(a) + (p — a) - s(a). 
Thus, s(q) is a subgradient at q and holds for vg = s(q). 
For the converse, suppose that s(-) is of the form for some convex function 
f. We observe that 


p-s(q) = Pi [f(a) + (e: — q) - val = f(a) + (p - q) + Va 


= f(p) — [f (p) — f(a) — (P — q) : val. 


In particular, p-s(p) = f(p). Since vq is a subgradient, the quantity inside the 
square braces is nonnegative for all q, so p-s(q) < f(p) = p-s(p). 


REMARK 16.3.6. The proof implies that s(-) is a strictly proper scoring rule if 
and only if there is a strictly convex function f(-) for which|(16.7)| holds. 


COROLLARY 16.3.7. Using the recipe of the previous theorem, we obtain proper 
scoring rules as follows: 


e The quadratic scoring rule is 
si(a) = 2g: - dq. 
j 


This is a scoring rule obtained using 
F(a) =>_ g. 
i 


e The logarithmic scoring rule is 


si(q) = log(q:). 
This is the scoring rule obtained using 


fx(q) = 2 qi log qi. 


EXERCISE 16.a. Prove Corollary |16.3.7 
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Notes 
The VCG mechanism is named for Vickrey , Clarke |Cla71], and Groves : 


Vickrey’s work focused on single-item auctions and multiunit auction with down-sloping 
valuations, Clarke studied the public project problem, and Groves presented the gen- 
eral formulation of VCG. Clarke proposed the “Clarke pivot rule” - the specific constant 
added to payments that guarantees individually rational mechanisms. This rule enables 
the interpretation of the payments as the externalities imposed on other bidders. 

“The lovely but lonely Vickrey auction” by Ausubel and Milgrom describes 
the features and deficiencies of VCG and discusses why it is used infrequently in practice. 
For more on VCG, the related theory, and its many applications, see 
[Roul4]. 

In some settings, e.g., combinatorial auctions, computing the social surplus maxi- 
mizing outcome is intractable. Unfortunately, the VCG payment scheme does not re- 
main truthful when an approximate social surplus maximizing outcome is selected (see, 
e.g., ). 

Consider the setting of and let f be a function from valuation functions to 
outcomes A. We say that the function f can be truthfully implemented if there is a 
payment rule from reported valuation functions to outcomes such that for every i, vi(-), 
bi), and b_;(-), 

vi(f(vi, b-i)) — p(vi, b-i) 2 vi(f (bi, b-i)) — p(0i, ba). 

In other words, the payment rule incentivizes truth-telling. We’ve already seen that the 
social surplus function, f = argmax,., vila), can be truthfully implemented. Affine 
maximization, i.e., f = argmax,¢ 4: >>, (civi(a) +a), where A’ C A, can also be truthfully 
implemented. (See |Exercise 16.3}) In single-parameter domains (such as the win/lose 
settings discussed in|Chapter 15), where the bidder’s private information is captured by a 
single number, any monotone function can be truthfully implemented. (See, e.g., [Nis07].) 
In contrast, when valuation functions v; : A — R can be arbitrary, Roberts has 
shown that the only functions that can be implemented by a truthful mechanism are affine 
maximizers. For other results on this and related topics, see, e.g., 
SY05] [AK08]. 

As we have discussed, mechanism design is concerned with the design of protocols (and 
auctions) so that rational participants, motivated solely by their self-interest, will end up 
achieving the designer’s goal. Traditional applications of mechanism design include writing 
insurance contracts, regulating public utilities and constructing the tax code. Modern 
applications include scheduling tasks in the cloud, routing traffic in a network, allocating 
resources in a distributed system, buying and selling goods in online marketplaces, etc. 
Moreover, some of these problems are being solved on a grand scale, with mechanisms 
that are implemented in software (and sometimes hardware). Even the bidding is often 
done by software agents. For this reason, efficient computability has become important. 

The objective of understanding to what extent appropriate incentives and computa- 
tional efficiency can be achieved simultaneously was brought to the fore in a paper by 
Noam Nisan and Amir Ronen, who shared the 2012 Gödel Prizd] for “laying the founda- 
tions of algorithmic game theory.” 

For further background on algorithmic game theory, with emphasis on its computa- 
tional aspects, see and the survey chapter by Nisan 
in [ZI]. 

The first scoring rule is due to Brier [Bri50], who discussed, in an equivalent form, 
the quadratic scoring rule. Selten |Sel98} provides an axiomatic characterization. The 


T The Gödel Prize is an annual prize for outstanding papers in theoretical computer science. 
As discussed in |Chapter 8| Nisan and Ronen shared the prize with Koutsoupias, Papadimitriou, 
Roughgarden, and Tardos. 
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Noam Nisan 


logarithmic scoring rule was first introduced by Toda and further developed by 
Winkler and Murphy [Win69a]. Theorem is adapted from Gneiting and 
Raftery [GRO7]. 

In the text, we argued that the weather forecaster M can achieve calibration just by 
knowing the overall percentage of rainy days. In fact, M can be well-calibrated without 
any information about the weather, as was first shown by Foster and Vohra [FV99]. See 
also [HMC00a]. 


Another interesting application of scoring rules is in prediction markets. See, e.g., 


CP10]. 


Exercises 


16.1. Generalize the VCG mechanism to handle the case where the designer has 
a value V (a) for each outcome a € A. Specifically, suppose the outcome 
selected maximizes social surplus (see [Definition 16.2.2). Show that the 
payment scheme proposed in [(16.1)] incentivizes truthful reporting. 


16.2. Consider two bidders A and B and two items a and b. Bidder A’s value for 
item a is 3, for item b is 2, and for both is 5. Bidder B’s value for item a is 
2, and for item b is 2, and for both is 4. Suppose that two English auctions 
are run simultaneously for the two items. Show that truthful bidding is 
not a dominant strategy in this auction. 


16.3. Consider the setting discussed in[§16.2| Fix a subset A’ of the outcomes A, 
a set of numbers {y,|a € A’}, and c; for each 1 <i < n. Design a truthful 
mechanism that takes as input the agents’ reported valuation functions, 
and chooses as output an outcome a € A’ that maximizes Ya + >>; civi(a). 
In other words, come up with a payment scheme that incentivizes truthful 
reporting. 


16.4. Let f be a mapping from reported valuation functions v = (v1(-),.--,Un(-)) 
to outcomes in some set A. Suppose that M is a direct revelation mech- 
anism that on input v chooses outcome f(v) and sets payments p;(v) for 
each agent. Show that M is truthful if and only if the following conditions 
hold for each 7 and v_;: 
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e The payment doesn’t depend on v;, but only on the alternative se- 
lected. That is, for fixed v_; and for all v;(-) such that f(v;,v_;) = a, 
the payment p; is the same. Thus, for all v; with f(v;,v_i;) = a, it 
holds that p(v;, v-i) = Pa- 
e For each v_i, let A(v_;) be the range of f(-,v_;). Then f(v;, v_;) € 
argmaXae afv) (Vila) — Pa). 
16.5. (a) Show that the scoring rule sẹ (p) = ap?~! — 
for any a > 1. 
(b) Show that these scoring rules interpolate between the logarithmic and 
the quadratic scoring rules, by showing that 


(a — 1) X; P$ is proper 


(p)—1 
lim IDII = log pi. 
asl a- 
16.6. Show that the scoring rule 
Pi 
si(p) =, 7 
\|P|| 


is proper. 
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CHAPTER 17 


Matching markets 


17.1. Maximum weighted matching 


Consider a seller with n different items for sale and n buyers, each interested 
in buying at most one of these items. Suppose that the seller prices item j at pj, so 
the price vector is p = (pi,...,Pn). Buyer it’s value for item j is vij > 0; i.e., buyer 
i would only be willing to buy item j if p; < vij, and, in this case, his utility for 
item j is vj; — pj. Given these prices, each buyer i has a set of preferred items: 


Dip) = {j | Vk vij — pj = Vik — pe and viz > pj}. (17.1) 


If each buyer with nonempty D;(p) selects one of his preferred items, a conflict 
could arise between buyers who select the same item. 

We will show that there is a price vector p* and a corresponding perfect match- 
ing M between buyers and items in which each buyer is matched to one of his 
preferred items (so there is no conflict). Moreover, this matching M maximizes the 


social surplus: 
` (vima — Pma) + 5 pj = X UiM (i): 
j i 


This will follow from a generalization of König’s Lemma (Lemma 3.2.5). 


THEOREM 17.1.1. Given a nonnegative matriz V = (vij)nxn, let 
K := {(u, p) ER” xR”: u; pj > 0 and u; + pj > Vij Vi j}. 
Then 


j ; ee ee by 17.2 
(up)eK {2 +E) mahsi M (Z uuo} ( ) 


REMARKS 17.1.2. 


(1) The permutation maximizing the right-hand side is called a maximum 
weight matching, and the pair (u, p) minimizing the left-hand side is 
called a minimum (fractional) cover. The special case when the entries 
of V, u, and p are all 0/1 reduces to [Lemma 3.2.5} 

(2) Let v = maxv,;. If (u, p) € K, then replacing every u; > v and p; > v by 
v yields a pair (ù, p) € K with a smaller sum. Therefore, the minimization 
in [(17.2)]can be restricted to the set K’ := K N {(u, p) : ui, pj < v Vi, j}. 
This means that a minimum cover exists because the continuous map 

n 
F(u, p) := 5 (ui + pi), 
i=1 
defined on the closed and bounded set K’ does indeed attain its infimum. 
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PROOF OF [THEOREM 17.1.1} Suppose the left-hand side of |(17.2)| is mini- 
mized at some (u*,p*) € K. By the definition of K, the following holds for all 7 
and j: 


uj +p; > vj; and therefore F(u*,p*) > 5 vima VY matchings M. 


To prove the equality in |(17.2)| consider the bipartite graph G between the rows 
and the columns, in which 
(i, j) is an edge iff uj + pj = vij. 
(In the context of the example at the beginning of this section, an edge (i, j) means 
that at the prices p*, item j is one of buyer 7’s preferred items.) 
If there is a perfect matching M* in G, we are done. If not, then by Hall’s 


Theorem |(3.2.2), there is a subset S of rows such that |N(S)| < |S|, where N(S) 
is the set of neighbors of S in G; we will show this contradicts the definition of 


(u*,p*). Let 
ui := u} — ô: lyes} and pj :=ppt+d-1Igencsy Vij, 
where 
ô= min už +0 — v; >0. 
iesjgn(s) 2 Pi Yu 
(See|Figure 17.1}) 
Buyers Items 
decrease increase 
uj by ô pjby ô 
VieS vjeN(S) 


FIGURE 17.1. The figure illustrates the first part of the argument used to 
show that there must be a perfect matching. 


Then u;+p,; > v;; holds for all (i, j) and, since |N(S)| < |S], we have F(u, p) < 
F(u*,p*). However, some u; might be negative. To rectify this, let € = min, pj, 
and define 

üi := u; +e Vi and pj:=p;—e Vj. 
Then ŭ; +p; > viz holds for all (i, j). Since pj = 0 for some j and v;,; > 0, it follows 
that ŭ; > 0 for all i. Thus, (a,p) € K. Clearly, F(a, p) = F(u,p) < F(u*,p*), 
contradicting the minimality of F (u*, p*). 


REMARK 17.1.3. The assumption that V is a square matrix can easily be re- 
laxed. If V has fewer rows than columns, then adding rows of zeros reduces to the 
square case, similarly for columns. 
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17.2. Envy-free prices 


Consider again n buyers and a seller with n items for sale, where v;; is buyer 
ts value for item 7. 


DEFINITION 17.2.1. Given the values V = (vij), 1 < i,j < n, and a set of 
prices p = (pi,..-.,Pn), the demand graph D(p) is the bipartite graph with an 
edge between a buyer 7 and an item j if, at the prices p, item 7 is one of it’s most 
preferred items; that is, 

Vk vig — Dj È Vig — Pk and vij > pj. 


We say that a price vector p is envy-free if there exists a perfect matching in the 
demand graph D(p). 


LEMMA 17.2.2. Let V = (vij)nxn, U,p € R”, all nonnegative, and let M be a 
perfect matching from |n] to |n]. The following are equivalent: 


(i) (u,p) is a minimum cover of V and M is a maximum weight matching 
for V. 

(ii) The prices p are envy-free prices, M is contained in the demand graph 

D(p), and ui = imi) — PM(i)- 

PRooF. (i)—(ii): In a minimum cover (u, p) 

Ui = MAX Vie — pe- (17.3) 
Given a minimum cover (u, p), define, as we did earlier, a graph G where (i, j) is 
an edge if and only if u; + p; = vij. By|(17.3)| G is the demand graph for p. By 


Theorem 17.1.1| the weight of a maximum weight matching in V is $; u;+ >> jPi- 
erefore 
5 UM(i) = 5 ui + Sop; = X (u + pM,). 
a j 


i i 


Since vim(i) < Ui + Pua) for each i and since we have equality on the sum, these 
must be equalities for each i, so M is contained in G = D(p). 

(ii) (i): To verify that (u,p) is a cover, observe that for all i,k, we have 
Ui = viM(i) — PM(i) È Vik — Pk Since M is contained in the demand graph. The 


equality )0; ui+pma) = > 2; vima) together with/Theorem 17.1.1]implies that (u, p) 


is a minimum cover and M is a maximum weight matching. 


COROLLARY 17.2.3. Let p be an envy-free pricing for V and let M be a perfect 
matching of buyers to items. Then M is a maximum weight matching for V if and 
only if it is contained in D(p). 


17.2.1. Highest and lowest envy-free prices. 


LEMMA 17.2.4. The envy-free price vectors for V = (vij)nxn form a lattice: 
Let p and q be two vectors of envy-free prices. Then, defining 


aN\b:=min(a,b) and aVb:=max(a,b), 
the two price vectors 
PAG=(PIAG,---;Pn NG) and pVq=(piVG,---,Pr VM) 


are also envy-free. 
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PROOF. It follows from|Lemma 17.2.2|/that p is envy-free iff there is a nonneg- 


ative vector u such that (u, p) is a minimum cover of V. That is, u; +p; > vij 
for all (i,j) with equality along every edge (i,j) that is in some maximum weight 
matching. Thus, it suffices to prove the following: 


If (u, p) and (a, p) are minimum covers, then so is (u V ù, p A P). (17.4) 
Fix an edge (i, j). Without loss of generality p; = pj A j, so 
(ui V ŭi) + (pj A Di) 2 ui + pj È viz. 


Moreover, if (i, j) is in a maximum weight matching, then since u; +p; = t; + pj = 
Vij, the assumption p; < p; implies that u; > ŭ;, and hence 


(ui V ŭi) + (pj A Dj) = ui + pj = vij. 


Switching the roles of u and p in|(17.4)|shows that if (u, p) and (ù, p) are minimum 
covers, then so is (uA ù, p V Ð). 


COROLLARY 17.2.5. Let p minimize ys pj among all envy-free price vectors 
for V. Then: 
(i) Every envy-free price vector q satisfies pi < q; for all i. 
(ii) min; pj = 0. 
PROOF. (i) If not, then p A q has lower sum than p, a contradiction. 
(ii) Let (u, p) be a minimum cover. If e = min, p; > 0, then subtracting € from 
each p; and adding e€ to each u; yields a minimum cover with lower prices. 


We refer to the vector p in this corollary as the lowest envy-free price 
vector. The next theorem gives a formula for these prices and the corresponding 
utilities. 

THEOREM 17.2.6. Given an nxn nonnegative valuation matrix V, let MV 
be a maximum weight matching and let | MV || be its weight; that is, \||MY|| = 
di 4imv a). Write V_; for the matrix obtained by replacing row i of V by 0. Then 
the lowest envy-free price vector p for V and the corresponding utility vector u are 
given by 

MY (i) =j = p = ||MY || — (M”l — vij), (17.5) 
ui = |MY- |M] vi. (17.6) 


REMARK 17.2.7. To interpret |(17.5)| observe that ||M-‘|| is the maximum 
weight matching of all agents besides i and ||MY|| — vij is the total value received 
by these agents in the maximum matching that includes 7. The difference in[(17.5)]is 
the externality 2’s presence imposes on the other agents, which is the price charged 
to i by the VCG mechanism (as discussed in [§16.2). 


PROOF. Let p := p be the lowest envy-free price vector for V, and let u:=U 
be the corresponding utility vector. We know that 


X (un + pr) = MI). (17.7) 
7 


If we could find another matrix for which p is still envy-free, buyer 7 has utility 0 
and all other utilities are unchanged, then applying [(17.7)] to that matrix in place 
of V would yield a formula for u;. A natural way to reduce buyer i’s utility is to 
zero out his valuations. Below, we will prove the following: 
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CLAIM 17.2.8. ((0,u_;),p) is a minimum cover for V_; (though p may no 
longer be the lowest envy-free price vector for V_;). 


With this claim in hand, we obtain 


X (un + pr) — us = || MY]. 
k 


Subtracting |(17.8)| from |(17.7)| yields|(17.6) 


uy = |M] — ||‘. 


(17.8) 


Thus, if MY (i) = j, then 


= (MYI — vi). 


Pj = vij — u = |M 


PROOF OF CLAIM[L7.2.8] Clearly ((0,u_;),p) is a cover for V_;. To prove 
that it is a minimum cover, by it suffices to show that there is a 
perfect matching in the demand graph D’(p) for V_;. Observe that the only edges 
that are changed relative to the demand graph D(p) for V are those incident to i. 
Furthermore, edge (i, j) is in D’(p) if and only if p; = 0. 

Suppose that there is no perfect matching in D’(p). By Hall’s Theorem (3.2.2), 
there is a subset S of items such that its set of neighbors T in D’(p) has |T| < |S]. 
Since there is a perfect matching in D(p) and the only buyer whose edges changed 
is i, it must be that i Z T, and therefore for all k € S, we have pẹ > 0. But this 
means that, for small enough e€, the vector (u’, p’) with 


Dy = Pk — €` likes} and up := w + €` Lpeerugiyy Vk, £ 


is a minimum cover for V, a contradiction to the choice of p as the lowest envy-free 


price vector. (See|Figure 17.21) 


Buyers Items Buyers Items 
Pj, = 0 
i Ph =0 increase į 
uj by E 
Ss S 
increase E decrease 
wbyE Pk bye 
Pr > 0 WeT 
Ykes VEE S 


FIGURE 17.2. The modified utility and price vector shown in the right figure 
a minimum cover for V with prices that have lower sum then p, the presumed 
minimum envy-free price vector. This is a contradiction. 


Employing the symmetry between u and p, we obtain 
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COROLLARY 17.2.9. Under the hypotheses of|Theorem 17.2.6, the highest envy- 


free price vector p for V and the corresponding utility vector u are given by 
MY (i) =j = u = |M” - (IMI — v), (17.9) 
Py =|| M- MY I vi, (17.10) 


where VTI is the matrix obtained by replacing column j of V by 0. 


17.2.2. Seller valuations and unbalanced markets. In the previous sec- 
tions, we implicitly assumed that the seller had value 0 for the item being sold since 
the minimal acceptable price for an item was 0. The general case where the seller 
has value s; for item j, i.e., would not accept a price below sj, can be reduced 
to this special case by replacing each buyer valuation vi; by max(v;; — sj, 0). (If 
buyer 7 is matched to item j with s; > vij, it must be at a price of 0, and no actual 
transaction takes place.) 

Also, we assumed that the number of buyers and items was equal. If there are 
n buyers but k < n items for sale, we can add n — k dummy items of value 0 to all 
buyers. Similarly, if there are j < n buyers but n items, we can add n — j dummy 
buyers with value 0 for all items. In the latter case, any item that is matched to a 
dummy buyer (i.e., is unsold) must have price 0. 


17.3. Envy-free division of rent 


Consider a group of n people that would like to find an n-room house to rent. 
Suppose that person 7 assigns value vj; to the jt} room. Is there an envy-free way] 
to partition the rent R, accounting for the different valuations V = (viz)? 


FIGURE 17.3. Three roommates need to decide who will get each room and 
how much of the rent each person will pay. 


Clearly, if the rent is higher than the weight of every matching in V, then it 
does not admit fair division. However, even if the rent is too low, there may be a 
problem. For example, suppose vj, > vj2 + R for i = 1,2. Then whichever of renter 
1 and 2 is not assigned to room 1 will envy the occupant of that room. 


1 That is, a vector p1,...,Pn of envy-free prices such that yj pj = R. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


17.4. FINDING MAXIMUM MATCHINGS VIA ASCENDING AUCTIONS 305 


and provide an a for computing the 
minimum and maximum envy-free rent: Use and to determine the 
lowest envy-free rent R = )/, P; and the highest envy- fee rent R = 4 P;- By 
[Lemma 17.2.2} the set of envy- ree prices is a convex set. Thus, for any R = 
aR + (1 — a)R, with 0 < a < 1, envy-free prices are obtained by setting 


Pj = OP, +(1- a)p;. 


In the next section, we present an algorithm for finding the value of a maximum 
weighted matching in a graph. This can be used to compute |(17.5)|and |(17.10) 


17.4. Finding maximum matchings via ascending auctions 


Consider an auction with n bidders and n items, where each bidder wants at 


most one item, and the valuations of the bidders are given by a valuation matrix 
V — (viz). 


e Fix the minimum bid increment ô = 1/(n + 1). 
e Initialize the prices p of all items to 0 and set the matching M of bidders 
to items to be empty. 
e As long as M is not perfect: 
— one unmatched bidder 7 selects an item 7 in his demand set 


Dip) = {j | vij — pj > Vvik— pk VE and vij > pj} 


and bids p; + ô on it. 
(We will see that the demand set D;(p) is nonempty.) 

— If j is unmatched, then M (i) := j; otherwise, say M (£ F j, remove 
(£, j) from the matching and add (i, j), so that M (i) := 

— Increase p; by ô. 


THEOREM 17.4.1. Suppose that the elements of the valuation matriz V = (vij) 
are integers. Then the above auction terminates with a mazimum weight matching 
M, and the final prices p satisfy 


M(i)=j => vij- pj? vik—-pk—ô Vk. (17.11) 


PROOF. From the moment an item is matched, it stays matched forever. Also, 
until an item is matched, its price is 0. Therefore, as long as a bidder is unmatched, 
there must be an unmatched item, so his demand set is nonempty. Moreover, 
all items in D;(p) have price at most max; vij. Since no item’s price can exceed 
max;j vij +d and some price increases by ô each round, the auction must terminate. 
When it terminates, the matching is perfect. 

Suppose that in the final matching, M(i) = j. After i bid on j for the last 
time, p; was increased by 6, so[(17.11)] held. Later, prices of other items were only 
increased, so the final prices also satisfy [(17.11)} 

Finally, let M* be any other perfect matching. By|(17.11) 


Simo — pmci)) = S (vim — Pm=(i) — 9); 


i i 


i.e. 
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Since the weight of any perfect matching is an integer, W must be a maximum 
weight matching. 


EXERCISE 17.a. Argue that the number of times the main loop is executed in 
the above algorithm is at most +>; max, (vij + ô). 


17.5. Matching buyers and sellers 


The previous theorems can be interpreted in a different context known as the 
assignment game: Consider n buyers and n sellers, e.g., homeowners. (The case 
where the number of buyers differs from the number of sellers can be handled as in 
§17.2.2}) Let v;; denote the value buyer i assigns to house j. We assume that each 
house has value 0 to its owner if unsold. (We consider the case where the value is 
positive at the end of the section.) If seller j sells the house to buyer i at a price of 
Pj, then the buyer’s utility is u; = vij — pj. 


DEFINITION 17.5.1. An outcome (M, u, p) of the assignment game is a match- 
ing M between buyers and sellers and a partition (u;, pj) of the value vij on every 
matched edge; i.e., u; +p; = vij, where ui, pj > 0 for all ¿, j. If buyer 7 is unmatched, 
we set u; = 0. Similarly, p; = 0 if seller 7 is unmatched. 

We say the outcome is stabld’| if u; + pj > vij for all i, j. 


By [Theorem 17.1.1] we have 


PROPOSITION 17.5.2. An outcome (M,u, p) is stable if and only if M is a maz- 
imum weight matching for V and (u,p) is a minimum cover for V. In particular, 
every maximum weight matching supports a stable outcome. 


DEFINITION 17.5.3. Let (M, u, p) be an outcome of the assignment game. De- 
fine the excess {; of buyer i to be the difference between his utility and his best 
outside option] i.e., (denoting x+ := max(z,0)), 


bi := u; — max{ (Vik — pk)+ : (i,k) g M}. 
Similarly, the excess s; of seller j is 
sj := pj — max{ (vej — w)+ : (6j) € M}. 


The outcome is balanced if it is stable and, for every matched edge (i, j), we have 
Bi = sj. 


REMARK 17.5.4. Observe that an outcome is stable if and only if all excesses are 
nonnegative. In fact, it suffices for all buyer (or all seller) excesses to be nonnegative 
since in an unstable pair both buyer and seller will have a negative excess. If an 
outcome is balanced, then every matched pair has reached the Nash bargaining 


solution between them. (See|Exercise 12.b}) 


THEOREM 17.5.5. Every assignment game has a balanced outcome. Moreover, 
the following process converges to a balanced outcome: Start with the minimum 
cover (u, p) where p is the vector of lowest envy-free prices and a maximum weight 


2 In other words, an outcome is unstable if there is an unmatched pair (k,g) with value 
Uke > Uk + Pe. 

3 The best outside option is the maximum utility buyer i could obtain by matching with a 
different seller and paying him his current price. 
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matching M. Repeatedly pick an edge in M to balance, ensuring that every edge in 
M is picked infinitely often. 


We will need the following lemma. 

LEMMA 17.5.6. Let (M,u,p) be a stable outcome with B; > sj > 0 for every 
(i,j) € M. Pick a pair (i,j) € M with Bi > s; and balance the pair by performing 
the update 

Bi — Sj 
2 ed 
leaving all other utilities and profits unchanged. Then the new outcome is stable 
and has excesses Bi, = si and bi, > sp > 0 for all (k, 0) € M. 


— 
ENE Z 


and Di; := pj + 


PROOF. By construction, after the update, we have Ø; = s}. Next, consider 
any other matched pair (k, £). Since p} > pj, buyer k’s best outside option is no 
better than it was before, so 8, > pp. Similarly, since u; < uj, seller Z’s best outside 
option is at least what it was before, so s < sọ. 

The updated outcome is still stable since all buyer excesses are nonnegative 


(see |Remark 17.5.4): The excess of buyer i is at least half of what it was before 


and the excesses of other buyers have not decreased. 


PROOF OF [THEOREM 17.5.5} Let M be a maximum weight matching. Start 
with the stable outcome defined by lowest envy-free prices {pS} and corresponding 
utilities {u9}. The corresponding initial buyer excess is 8? > 0 for all i, and the 
initial seller excess is s} = 0 for all j. (If sf = e > 0, then decreasing pj by € and 
increasing u; by € is still stable and has lower prices.) 

Fix a sequence {(i+, jt)}r>1 of matched pairs. At time t > 1, balance the edge 
(iz, je). This yields a new price vector pê and a new utility vector ut. Using the 
lemma and induction, we conclude that for all t, the outcome (M, ut, p‘) is stable 
with 8; > s$ for all matched pairs (i, j). 


Moreover, ut < ut“! and p > p (with equality for i Æ i; and j Æ j+). Thus, 
{ut}i>1 decreases to a limit u; for all i and {pi }i>1 increases to a limit p; for all j. 

We claim that if every edge (i,j) € M occurs infinitely often in {(¢, jt) }isi, 
then the limiting (u,p) is balanced. Indeed, let (i,j) E€ M. Observe that ĝi 
and sj are continuous functions of (u, p). Then ut — u and p’ > p imply that 
Bi — bi and si — sj. Since 6; = s4 after every rebalancing of (i, j), i.e., when 
(it, jt) = (i, j), we deduce that 8; = sj. 


REMARK 17.5.7. In general, balanced outcomes are not unique. 


17.5.1. Positive seller values. Suppose seller j has value h; > 0 for his 
house. We may reduce this case to the case hj = 0 for all j as follows. Let 0; 
denote the value buyer i assigns to house j. Then vij := (oj; —h;)+ is the surplus 
generated by the sale of house j to buyer i. (If Oj; < hj, then there is no sale and 
vij = 0.) If seller j sells the house to buyer i at a price of hj + pj, then the seller’s 
profit is p; and the buyer’s utility is u; = 0,; — (p; + hj) = vi; — pj- The previous 
discussion goes through with surpluses replacing values and profits replacing prices. 


17.6. Application to weighted hide-and-seek games 


In this section, we show how to apply |Theorem 17.1.1|to solve the following 


game. 
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EXAMPLE 17.6.1 (Generalized Hide and Seek). We consider a weighted 
version of the Hide and Seek game from Let H be an n x n nonnegative 
matrix. Player II, the robber, chooses a pair (i, j) with hi; > 0, and player I, the 
cop, chooses a row 7 or a column j. The value hy; > 0 represents the payoff to 
the cop if the robber hides at location (i,j) and the cop chooses row i or column 
j. (E.g., certain safehouses might be safer than others, and h;; could represent the 
probability the cop actually catches the robber if she chooses either 2 or 7 when he 
is hiding at (7,7).) The game is zero-sum. 


Consider the following class of player II strategies: Player II first chooses a 
fixed permutation M of the set {1,...,n} and then hides at location (i, M (i)) with 
a probability q; that he chooses. If hjjg(i) is 0, then q; is 0. For example, if n = 5 
and the fixed permutation M is 3,1,4,2,5, then the following matrix gives the 
probability of player II hiding in different locations: 


0 0q 0 0 
@ 0 0 0 0 
0 0 0 g 0 
0 a 0 0 0 


0 0 0 0 @ 


Given a permutation M and a robber strategy defined by the probability vector 
(d1; ---, dn), the payoff to the cop if she chooses row t is qihim(i), and her payoff if 
she chooses column j is qjy-1(j)h-1(j),j;- Obviously, against this robber strategy, 
the cop will never choose a row i with hmg) = 0 or a column j with hyy-1(;),; =0 
since these would yield her a payoff of 0. Let 


1 . 
TAR hid if hij > 0, 
i 0 otherwise. 


To equalize the payoffs to the cop on rows with himu) > 0 and columns with 
hm-1(j),j > 0 (recall [Proposition 2.5.3), the robber can choose 


Vi, M (i) 


n 
= where Vm = Vi Mi- 
qi Vu M >, i,M(i) 


With this choice, each row or column with positive expected payoff yields an ex- 
pected payoff of 1/Vm. 

If the robber restricts himself to this class of strategies, then to minimize his 
expected payment to the cop, he should choose the permutation M* that minimizes 
1/Vm, i.e., maximizes Vm. We will show that doing this is optimal for him not just 
within this restricted class of strategies but in general. 


To see this, observe that by|Theorem 17.1.1| there is a cover (u*, p*), such that 


uj + pj = viz, for which 


5 (uj +o) = max Vm = Vm». (17.12) 


i=1 
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Now suppose that the cop assigns row i probability už/Vm* and column j 
probability pž/Vm» for all 7,7. Against this strategy, if the robber chooses some 
(i, j) (necessarily with hi; > 0), then the payoff will be 

(uj +P) y > vighig _ 1 l 
Vm» Vm» Vx 
We deduce that the cop can guarantee herself a payoff of at least 1/Vm», whereas 


the permutation strategy for the robber described above guarantees that the cop’s 
expected payoff is at most 1/Vjy+. Consequently, this pair of strategies is optimal. 


EXAMPLE 17.6.2. Consider the Generalized Hide and Seek game with proba- 
bilities given by the following matrix: 


1 1/2 


1/3 1/5 


This means that the matrix V is equal to 


3 5 


For this matrix, the maximum matching M™ is given by the identity permuta- 
tion with Vm» = 6. This matrix has a minimum cover u = (1,4) and p = (0,1). 
Thus, an optimal strategy for the robber is to hide at location (1,1) with proba- 
bility 1/6 and location (2,2) with probability 5/6. An optimal strategy for the cop 
is to choose row 1 with probability 1/6, row 2 with probability 2/3, and column 2 
with probability 1/6. The value of the game is 1/6. 


Notes 


The assignment problem (a.k.a. maximum weighted bipartite matching) has a long 
and glorious history which is described in detail in Section 17.5 of Schrijver [Sch03|. An 
important early reference not mentioned there is Jacobi [Jac90b|. [Theorem 17.1.1| was 


proved by Egerváry |Ege31| and led to the development of an algorithm for the assign- 
ment problem by Kuhn [Kuh55|. He called it the “Hungarian method” since the ideas 


were implicit in earlier papers by König and Egerváry. Munkres later 
sharpened the analysis of the Hungarian method, proving that the algorithm is strongly 
polynomial, i.e., the number of steps to execute the algorithm does not depend on the 
weights, rather is polynomial in n x m . For more about matching theory, including more 
efficient algorithms, see, e.g., Lovász and Plummer [EP09b] and Schrijver [Sch03]. 

John von Neumann considered the Generalized Hide and Seek game discussed 
in and used the Minimax Theorem applied to that game to reprove[Theorem 17.1.1] 
see pant 

The assignment game as described in was introduced by Shapley and Shu- 
bik [SS71]. They considered a cooperative game, where the payoff u(S) for a set of agents 
is v(S) := maxms zq, j)emg Vis (the maximum is over matchings Ms of agents in S). Us- 
ing the natural linear programming formulation, they proved that the core of this game, 
i.e., the set of stable outcomes, is nonempty, thereby proving They 
also showed that the core of this game coincides with the set of competitive equilibria. (A 
competitive equilibrium, also known as a Walrasian equilibrium, is a vector of prices and 
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an allocation of items to agents such that (a) each agent gets an item in his demand set 
and (b) if an item is not allocated to an agent, then its price is 0.) Furthermore, they 


proved the lattice property of outcomes (Lemma 17.2.4), thereby showing that there are 


two extreme points in the core, one best for buyers and one best for sellers. 

Demange and Leonard proved [Theorem 17.2.6] and showed that the 
mechanism that elicits buyer valuations and sets prices as prescribed is truthful. This is 
the VCG mechanism applied to the setting of unit-demand agents. 

The auction algorithm for finding a maximum matching from [817.4] follows Demange, 
Gale, and Sotomayor [DGS86], who considered a number of auction mechanisms, building 
on work of Crawford and Knoer [CK81]. 


Gabrielle Demange David Gale Marilda Sotomayor 


There are striking analogies between some of the results in this chapter and those 
on stable marriage, including the lattice structure, deferred-acceptance algorithms and 
resilience to manipulation by the proposers. (However, if the matrix V is used to indicate 
preferences as discussed in then the unique stable matching is not necessarily a 
maximum weight sratehing Ses Lees Tas) See Roth and Sotomayor for more 
on this. 

Balanced outcomes were first studied by Sharon Rochford who proved [The-] 


orem 17.5.5} via Brouwer’s Fixed-Point Theorem (‘Theorem 5.1.2). The same notion in 
KT08) 


general graphs was studied by Kleinberg and Tardos |K’T'08], who showed that whenever a 


stable outcome exists, there is also a balanced outcome. They also presented a polynomial 
time algorithm for finding a balanced outcome when one exists. Azar et al. |ABCT09 


showed that the dynamics described in Meren comers to a balanced outcome 
starting from any stable outcome. Kanoria et al. prove convergence to a bal- 
anced outcome for a different dynamics in which the matching changes over time. 

The rent-division problem discussed in[§17.3]was considered by Francis Su [Su99], who 
showed that there is always an envy-free partition of rent under the “Miserly Tenants” 
assumption: No tenant will choose the most expensive room if there is a free room available. 
This assumption is not always reasonable; e.g., consider a house with one excellent room 


that costs € and 10 free closets. Other papers on fair rent division include |ADG91 ASU04). 
(We have been unable to verify the results in the latter paper.) 


Exercises 


17.1. Given the values V = (u,;), 1 < i,j < n, let G be a bipartite graph whose 
edge set is the union of all maximum weight matchings for V. Show that 
every perfect matching in G is a maximum weight matching for V. 


17.2. Use Brouwer’s Fixed-Point Theorem (Theorem 5.1.2) to give a shorter proof 
of the existence of balanced outcomes (Theorem 17.5.5). Hint: Order the 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


EXERCISES 311 


edges and balance them one by one. 


17.3. If A = (aij)nxn is a nonnegative matrix with row and column sums at most 
1, then there is a doubly stochastic matrix Snxn with A < S entrywise. 
(A doubly stochastic matrix is a nonnegative matrix with row and column 
sums equal to 1.) 


17.4. Let V = (vi;)nxn be a nonnegative matrix. Consider the Generalized Hide 


and Seek game where hij = UG (or 0 if vi; = 0). Prove/Theorem 17.1.1/as 


follows: 
(a) Prove that 


A 1 
(up)eK Lot ey) = 


where w is the value of the game by showing that (u, p) is a cop 
strategy ensuring her a payoff at least y if and only if +(u, p) is a 
cover of V. 

(b) Given a robber optimal strategy (qij) (where all qi; > 0, Dig qij = 1, 
and qij = 0 if vij; = 0), verify that the the n x n matrix 


A = (aij) := (=) 


w 


has row and column sums at most 1. Clearly w Dij aijvij = 1. Then 
apply the previous exercise and the Birkhoff-von Neumann Theorem 
(Exercise 3.4) to complete the proof. 
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CHAPTER 18 


Adaptive decision making 


Suppose that two players are playing multiple rounds of the same game. How 
would they adapt their strategies to the outcomes of previous rounds? This fits 
into the broader framework of adaptive decision making which we develop next and 
later apply to games. We start with a very simple setting. 


18.1. Binary prediction with expert advice and a perfect expert 


EXAMPLE 18.1.1. [Predicting the stock market] Consider a trader trying 
to predict whether the stock market will go up or down each day. Each morning, for 
T days, he solicits the opinions of n experts, who each make up/down predictions. 
Based on their predictions, the trader makes a choice between up and down and 
buys or sells accordingly. 


In this section, we assume that at least one of the experts is perfect, that is, 
predicts correctly every day, but the trader doesn’t know which one it is. What 
should the trader do to minimize the number of mistakes he makes in T days? 


First approach—follow the majority of leaders: 

On any given day, the experts who have never made a mistake are called leaders. 
By following the majority opinion among the leaders, the trader is guaranteed never 
to make more than log, n mistakes: Each mistake the trader makes eliminates at 
least half of leaders and, obviously, never eliminates the perfect expert. 

This analysis is tight when the minority is right each time and has size nearly 
equal to that of the majority. 


Second approach—follow a random leader (FRL): 

Perhaps surprisingly, following a random leader yields a slightly better guar- 
antee: For any n, the number of mistakes made by the trader is at most Hn — 1 in 
expectation|}] We verify this by induction on the number of leaders. The case of a 
single leader is clear. Consider the first day on which some number of experts, say 
k > 0, make a mistake. Then by the induction hypothesis, the expected number of 
mistakes the trader ever makes is at most 


k 
—+H,-,-1<H,-1. 
n 


This analysis is tight for T > n. Suppose that for 1 < i < n, on day i, only 
expert 7 makes a mistake. Then the probability that the trader makes a mistake 
that day is 1/(n—i+1). Thus, the expected number of mistakes he makes is H, — 1. 


TH, := 30", t € (Inn, Inn +1). 


i=17 
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REMARK 18.1.2. We can think of this setting as an extensive-form zero-sum 
game between an adversary and a trader. The adversary chooses the daily advice 
of the experts and the actual outcome on each day t, and the trader chooses a 
prediction each day based on the experts’ advice. In this game, the adversary seeks 
to maximize his gain, the number of mistakes the trader makes. 


Next we derive a lower bound on the expected number of mistakes made by 
any trader algorithm, by presenting a strategy for the adversary. 


PROPOSITION 18.1.3. In the setting of|Example 18.1.1| with at least one perfect 


expert, there is an adversary strategy that causes any trader algorithm to incur at 
least |logo n|/2 > [logan] mistakes in expectation. 


PROOF. Let 2" < n < 2*+1_ Let Eo denote the set consisting of the first 2k 
experts, and let E, be the subset of experts in E—ı that predicted correctly on 
day t. 

Now suppose that for each t € {1,..., k}, on day t half of the experts in Ey_ 1 
predict up and half predict down, and the rest of the experts predict arbitrarily. 
Suppose also that the truth is equally likely to be up or down. Then no matter how 
the trader chooses up or down, with probability 1/2, he makes a mistake. Thus, in 
the first k days, any trader algorithm makes k/2 mistakes in expectation. Moreover, 
after k = |logə n] days, there is still a perfect expert, and the expected number of 
mistakes made by the trader is |log, n|/2 > |log, n]. 


To prove a matching upper bound, we will take a middle road between following 
the majority of the leaders (ignoring the minority) and FRL (which weights the 
minority too highly). 


Third approach—a function of majority size: 

Given any function p : [1/2,1] — [1/2,1], consider the trader algorithm Ap: 
When the leaders are split on their advice in proportion (2,1 — x) with x > 1/2, 
follow the majority with probability p(x). 

If p(x) = 1 for all x > 1/2, we get the deterministic majority vote, while if 
p(x) = x, we get FRL. 

What is the largest a > 1 for which we can prove an upper bound of log, n on 
the expected number of mistakes? To do so, by induction, we need to verify two 
inequalities for all x € [1/2, 1]: 

log, (nx) + 1— p(x) < log, n, (18.1) 
log, (n(1 — x)) + p(x) < log, n. (18.2) 

The left-hand side of|(18.1)|is an upper bound on the expected mistakes of A, 
assuming the majority is right (using the induction hypothesis) and the left-hand 
side of |(18.2)| is an upper bound on the expected mistakes of A, assuming the 
minority is right. 

Adding these inequalities and setting x = 1/2 gives 2log,(1/2) +1 < 0; that 
is, a < 4. We already know this since |log4 n| is a lower bound for the worst case 
performance. Setting a = 4, the two required inequalities become 

p(x) 2 1 + log, T, 
p(x) < — log, (1 — 2). 
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Follow majority 


upper bound 
— log,(1 — 2) 


Follow random leader 


lower bound 
1+ logy x 


, 
t 
1 
3 2 


FIGURE 18.1. This figure illustrates the different strategies for the function p(x). 


We can easily satisfy both of these inequalities, e.g., by taking p(x) = 1 + log, x, 
since 7(1 — x) < 1/4. Since p(1/2) = 1/2 and p(1) = 1, concavity of the logarithm 
implies that p(x) > x for x € (1/2, 1). 


THEOREM 18.1.4. In the binary prediction problem with n experts including a 
perfect expert, consider the trader algorithm Ap that follows the majority of leaders 
with probability p(x) = 1+ log,x when that majority comprises a fraction x of 
leaders. Then for any horizon T, the expected number of mistakes made by A, is 
at most log, n. 


Comparing this to|Proposition 18.1.3]shows that when n is a power of two, we 


have identified minimax optimal strategies for the trader and the adversary. 


FIGURE 18.2. Expert advice. 
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18.2. Nobody is perfect 


Unfortunately, the assumption that there is a perfect expert is unrealistic. In 
the setting of [Example 18.1.1] let Li be the cumulative loss (i.e., total number of 
mistakes) incurred by expert i on the first t days. Denote 


Tt = min L; and $= {t| L =j}; 


i.e., S; is the time period during which the best expert has made j mistakes. 

It is natural (but far from optimal) to apply the approach of the previous 
section: Suppose that, for each t, on day t + 1 the trader follows the majority 
opinion of the leaders, i.e., those experts with Li = Lt. Then during Sj, the 
trader’s loss is at most loga n + 1 (by the discussion of the case where there is a 
perfect expert). Thus, for any number T of days, the trader’s loss is bounded by 
(logon + 1)(L7 +1). Similarly, the expected loss of FRL is at most H (LT + 1) 
and the expected loss of the third approach above is at most (log,n +1)(L7 +1). 


REMARK 18.2.1. There is an adversary strategy that ensures that any trader 
algorithm that only uses the advice of the leading experts will incur an expected 
loss that is at least |log,(n)|L7 in T steps. See|Exercise 18.1 

18.2.1. Weighted majority. There are strategies that guarantee the trader 
an asymptotic loss that is at most twice that of the best expert. One such strategy 
is based on weighted majority, where the weight assigned to an expert is decreased 
by a factor of 1 — e each time he makes a mistake. 

Weighted Majority Algorithm 


Fix e € [0,1]. On each day t, associate a weight wt with each 
expert 2. 

Initially, set w? = 1 for all i. 

Each day t, follow the weighted majority opinion: Let U, be 
the set of experts parame up on day t, and D; the set predict- 
ing down. Predict “up” on day t if 


Wo(t—1) = So wit > Wplt-1):= SO wht 
iEUr iE Dt 


and “down” otherwise. 
At the end of day t, for each i such that expert i predicted 
incorrectly on day t, set 


=(1- ew; '. (18.3) 


Thus, wt = (1 — e)", where Lt is the number of mistakes made 
by expert i in the first t days. 


For the analysis of this algorithm, we will use the following facts. 
LEMMA 18.2.2. Let e€ [0,1/2]. Then e < —ln(1 — €) < e+ 2. 


PrRooF. Taylor expansion gives 


ra 
—ln(1 — €) ee 
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On the other hand, 
k 


€ ow 
y 7 ota) eee 
k>1 


since e < 1/2. 


THEOREM 18.2.3. Suppose that there are n experts. Let L(T) be the number of 
mistakes made by the Weighted Majority Algorithm in T steps with € < $ and let 
LT be the number of mistakes made by expert i in T steps. Then for any sequence 


of up/down outcomes and for every expert i, we have 


21 
L(T) < 2(1 + e) LT + 22” 


(18.4) 


PROOF. Let W(t) = >>, wt be the total weight on all the experts after the ¢'® 
day. If the algorithm incurs a loss on the tt? day, say by predicting up instead of 
correctly predicting down, then Wy (t — 1) > W(t — 1). But in that case 


W(t) < Wo(t—-1) + (1-)Wo(t—-1) < (1- =) W(t—1). 


Thus, after a total loss of L := L(T), 


€ €E 


W(T) < (1- £\" wo = (1- Syn. 


Now consider expert i who incurs a loss of L; := LT. His weight at the end is 
wp = (1 — e 


which is at most W (T). Thus 


Taking logs and negating, we have 
- Lim(1 — €) > -Lln (1- =) —Inn. (18.5) 
Applying [Lemma 18.2.2} we obtain that for e € (0, 1/2], 
L(e+e) > Ls —Inn 


or 
2Inn 


L(T) <2(1+e)LT + 
REMARKS 18.2.4. 
(1) It follows from|(18.5)|that for all e € (0, 1] 


L(T) < |In(1—e)|L7+Inn 
~ [Int — 5)| 


(18.6) 


If we know in advance that there is an expert with LT = 0, then letting 


e Î 1 recovers the result of the first approach to|Example 18.1.1 


(2) There are cases where the Weighted Majority Algorithm incurs at least 
twice the loss of the best expert. In fact, this holds for every deterministic 


algorithm. See|Exercise 18.3 
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18.3. Multiple choices and varying costs 


“T hear the voices, and I read the front page, and I know the speculation. But 
I’m the decider, and I decide what is best.” George W. Bush 


In the previous section, the decision maker used the advice of n experts to 
choose between two options, and the cost of any mistake was the same. We saw 
that a simple deterministic algorithm could guarantee that the number of mistakes 
was not much more than twice that of any expert. One drawback of the Weighted 
Majority Algorithm is that it treats a majority of 51% with the same reverence as 
a majority of 99%. With careful randomization, we can avoid this pitfall and show 
that the decision maker can do almost as well as the best expert. 

In this section, the decider faces multiple options, e.g., which stock to buy, 
rather than just up or down, now with varying losses. We will refer to the options 
of the decider as actions: This covers the task of prediction with expert advice, as 
the i* action could be “follow the advice of expert i”. 


EXAMPLE 18.3.1 (Route-picking). Each day you choose one of a set of n 
routes from your house to work. Your goal is to minimize your travel time. However, 
traffic is unpredictable, and you do not know in advance how long each route will 
take. Once you choose your route, you incur a loss equal to the latency on the 
route you selected. This continues for T days. For 1 < i < n, let LT be the total 
latency you would have incurred over the T days if you had taken route i every day. 
Can we find a strategy for choosing a route each day such that the total latency 
incurred is close to min; LT? 


The following setup captures the stock market and route-picking examples 
above and many others. 


DEFINITION 18.3.2 (Sequential adaptive decision making). On day t, a 
decider D chooses a probability distribution p* = (p{,..., p% ) over a set of n actions, 
e.g., stocks to own or routes to drive. (The choice of p’ can depend on the history, 
i.e., the prior losses of each action and prior actions taken by D.) The losses 
& = (f4,..., Æ) € [0,1]” of each action on day t are then revealed. 

Given the history, D’s expected loss on day t is p* - = X`; pit. The total 
expected loss D incurs in T days is 


P ; š FTE 
See the chapter notes for a more precise interpretation of Lp in the case where 
P P P D 
losses depend on prior actions taken by D.) 


REMARK 18.3.3. In stock-picking examples, D could have a fraction pt of his 
portfolio in stock 7 instead of randomizing. 


DEFINITION 18.3.4. The regret of a decider D in T steps against loss sequence 
£={£'\7 , is defined as the difference between the total expected loss of the decider 
and the total loss of the best single action; that is, 


Rr(D, £) := De — min LT, 
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where LF = > L. We define the regret of a decider D as 
Rr(D) := max Rr (D, £). (18.7) 


Perhaps surprisingly, there exist algorithms with regret that is sublinear in 
T; i.e., the average regret per day tends to 0. We will see one below. 


18.3.1. Discussion. Let @ = {€°}7_, be a sequence of loss vectors. A natural 
goal for a decision-making algorithm D is to minimize its worst case loss, i.e., 
maxe Th. But this is a dubious measure of the quality of the algorithm since on 
a worst-case sequence there may be nothing any decider can do. This motivates 
evaluating D by its performance gap 


max(Lp — B(é)), 
where B(£) is a benchmark loss for £. The most obvious choice for the bench- 
mark is B* = ae min ¢!, but this is too ambitious: E.g., if n = 2, the losses 
of the first expert {¢{}7_, are independent unbiased bits, and Æ = 1 — æ, then 


J [E> = B| = T/2 since B* = 0. Instead, in the definition of regret, we employ 


the benchmark B(£) = min; LT. At first sight, this benchmark looks weak since 
why should choosing the same action every day be a reasonable option? We give 
two answers: (1) Often there really is a best action and the goal of the decision 
algorithm is to learn its identity without losing too much in the process. (2) Alter- 
native decision algorithms (e.g., use action 1 on odd days, action 2 on even days, 
except if one action has more than double the cumulative loss of the other) can be 
considered experts and incorporated into the model as additional actions. We show 
below that the regret of any decision algorithm is at most ,/T log n/2 when there 
are n actions to choose from. Note that this grows linearly in T if n is exponential 
in T. To see that this is unavoidable, recall that in the binary prediction setting, 
if we include 27 experts making all possible predictions, one of them will make no 
mistakes, and we already know that for this case, any decision algorithm will incur 
worst-case regret at least T/2. 


18.3.2. The Multiplicative Weights Algorithm. We now present an al- 
gorithm for adaptive decision making, with regret that is o(T) as T — oo. The 
algorithm is a randomized variant of the Weighted Majority Algorithm; it uses 
the weights in that algorithm as probabilities. The algorithm and its analysis in 


‘Theorem 18.3.7|deal with the case where the decider incurs losses only. 
Multiplicative Weights Algorithm (MW) 


Fix e < 1/2 and n possible actions. 

On each day t, associate a weight wt with the i“ action. 
Initially, w? = 1 for all i. 

On day t, use the mixed strategy p’, where 


For each action i, with 1 < i < n, observe the loss & € [0,1] and 
update the weight wt as follows: 


wt = wt} exp(— eb). (18.8) 
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In the next proof we will use the following lemma. 


LEMMA 18.3.5 (Hoeffding Lemma). Suppose that X is a random variable with 
distribution F such thata < X <a+1 for some a < 0 and E [X] = 0. Then for 
all AER, 


[e>] < e% /8. 


For a proof, see |Appendix B.2.1| For the reader’s convenience, we prove here 


the following slightly weaker version. 


LEMMA 18.3.6. Let X be a random variable with E [X] = 0 and |X| < 1. Then 


forall AER, 
fe) < er /2. 
PROOF. By convexity of the function f(x) = e**, we have 
à E ae i 
ò < (l+az)e*+(1—2x)e = (x) 


2 


for x € [—1,1]. See|Figure 18.3 


(1+ 2)e4+(1—2)e 
2 


FIGURE 18.3. The blue curve is eò”. The purple curve is a line between e~* 


and eò. 


Thus, since |X| < 1 and E [X] = 0, we have 


e+e” = Ae 
[A] SEKO = —3— = 2 op 
k=0 
oo 2k 
=D; aa =e 


THEOREM 18.3.7. Consider the Multiplicative Weights Algorithm with n ac- 
tions. Define 


T 
-T 
Luw = 5 p‘ ` t, 


t=1 
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where & € [0,1J". Then, for every loss sequence {€'}7_, and every action i, we 
have 

Te logn 

a 


8 e’ 
where LT = S £. In particular, taking € = 4/ Blogn we obtain that for all i, 


— 1 
Tee < IF + 4/ gf logn; 
i.e., the regret Rr(MW, £) is at most 4/ T logn. 


PROOF. Let Wt = i cjep, WE = Nicicn WT! exp(—elt). Then 


=T 
Luw < Lj + 


i 
t—1 


wi a € 
Wei = 5 Wo exp(—elé) = > ri exp(—e%) =E le ap (18.9) 


i 


where X; is the loss the algorithm incurs at time t; i.e., 
P [X; = Ë] = pt. 

Let : 

E = [X] = p i e. 


By Hoeffding’s Lemma (Lemma 18.3.5), we have 


= = _ 
` [e Xe] —e-2 te «(X,-£ ] eg 


so plugging back into|(18.9)| we obtain 
W+ < e7% e®/8 Wt! and thus WT < eT eT?/8p, 

On the other hand, 

WT >w? =e 

— u x 
so combining these two inequalities, we obtain 
gl < e- T eT?/8n 
Taking logs, we obtain 
Te logn 


—T 
Luw SLi + + — 


The following proposition shows that the bound of|Theorem 18.3.7|is asymp- 


totically optimal as T and n go to infinity. 


PROPOSITION 18.3.8. Consider a loss sequence £ in which all losses are inde- 
pendent and equally likely to be O or 1. Then for any decision algorithm D, its 
expected regret satisfies 


3 [Rr(D, £)] = BVT. (1+0(1)) as T >o (18.10) 


where 


m= E | max g and Y;~ N(0,1). 


Moreover, 


Yn = /2logn (1 +o(1)) as no. (18.11) 
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PROOF. Clearly, for any D, the decider’s expected loss is T/2. Expert i’s loss, 
LT, is binomial with parameters T and 1/2, and thus by the Central Limit Theorem 


DTT) : ; : 
e converges in law to a normal (0,1) random variable Y;. Let L? = min; LT. 


Then as T > œ, 


te =P) p | 

+ +E 
JT/4 

which proves |(18.10)| See|Exercise 18.7] for the proof of|(18.11) 


18.3.3. Gains. Consider a setting with gains instead of losses, where g’ = 


min Y;| = — 
1<i<n l Yn» 


(gt, ..., g$) are the gains of the n experts on day t. As in the setting of losses, let 
a oe 
Gp = 5 p’ gt 


be the expected gain of decider D, and let GT := DD gt be the total gain of 
expert i over the T days. 


COROLLARY 18.3.9. In the setup of Definition|18.3.2, suppose that for all t, 
max; gt — min; g; <T. Then the Multiplicative Weights Algorithm can be adapted 


, . ; [Tlogn, ; 
to the gains setting and yields regret at most T4/ — 35"; i.e. 
oa Tl 
Guw > GF -14y — 


PROOF. Let gft ax = max, gj. Run the Multiplicative Weights Algorithm using 
the (relative) losses 


for all j. 


1 
& T 7 (Imax — gi). 


By |Theorem 18.3.7| we have for all actions j that 
1 [Tlogn 1 
ToD TE EaD 
t i 


t 


and the corollary follows. 


18.4. Using adaptive decision making to play zero-sum games 


Consider a two-person zero-sum game with payoff matrix A = {a;;}. Suppose T 
rounds of this game are played. We can apply the Multiplicative Weights Algorithm 
to the decision-making process of player II. In round t, he chooses a mixed strategy 
p‘; i.e., column j is assigned probability pj. Knowing p* and the history of play, 
player I chooses a row i+. The loss of action j in round t is bs = aij, and the total 
loss of action j in T rounds is LF = + Qij: 

The next proposition bounds the total loss Tes = ZL Ap’), of player II. 


PROPOSITION 18.4.1. Suppose the m x n payoff matrix A = {aij} has entries 
in [0,1] and player II is playing according to the Multiplicative Weights Algorithm. 


Let Kip E Am be a row vector representing the empirical distribution of actions 
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taken by player I in T steps; i.e., the it coordinate of oe is Helim, Then the 
total loss of player II satisfies 


— T1 
Taw < Tmin Xonp AY + > = 
y 


ProoF. By|Theorem 18.3.7} player II’s loss over the T rounds satisfies 


Imw <L? + a 
Since 
T 
iT =TY a = T(xZ p4); 
we have “ 


rT 7 T 
min L; =T min(Xemp 


and the proposition follows. 


A); = T minx Ay, 
y 


emp 


REMARK 18.4.2. Suppose player I uses the mixed strategy € (a row vector) in 
all T rounds. If player II knows this, he can guarantee an expected loss of 


min €Ay, 
y 
which could be lower than V, the value of the game. In this case, E(x Fis) = €, so 
even if player II has no knowledge of £, the proposition bounds his expected loss by 


Tl 
T min Ay + AN 
y 2 


Next, as promised, we rederive the Minimax Theorem as a corollary of Propo- 


sition [18.4.1 


THEOREM 18.4.3 (Minimax Theorem). Let A = {a;;} be the payoff matrix of 
a zero-sum game. Let 
V; = max minx’ Ay = max min(x’ A); 
x y x j 
and 


Vp = min max x? Ay = min max(Ay); 
yY x y i 


be the safety values of the players. Then vr = vrr. 


PROOF. By adding a constant to all entries of the matrix and scaling, we may 
assume that all entries of A are in [0, 1]. 

From Lemma [2.6.3| we have Vi < Vir. 

Suppose that in round ¢ player II plays the mixed strategy p’ given by the 
Multiplicative Weights Algorithm and player I plays a best response; i.e., 

i, = argmax,(Ap‘);. 
Then p 
l = max( Apt); > min max( Ay); = Vir, 
i y i 


from which m 
Luw > TVi- (18.12) 
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(Note that the proof of|(18.12)| did not rely on any property of the Multiplicative 
Weights Algorithm.) On the other hand, from Proposition |18.4.1| we have 


= /T 1 
Tiy < Tmin Xeinp AY + —, 
y 


and since miny xe np AY < Vi, we obtain 


T logn 


TVi < TVi + 2 ’ 


and hence Vir < V7. 


18.5. Adaptive decision making as a zero-sum game* 


Our goal in this section is to characterize the minimax regret in the setting of 


Definition 18.3.2} i.e., 
min max Rp(Dy 1), {4 18.13 
ers T (Door {6 }) ( ) 


as the value of a finite zero-sum game between a decider and an adversary. In 
(18.13)| the sequence of loss vectors {€'}/_, is in [0,1]”", and Djo,1) is a sequence 
of functions {p’}7_,, where 

ps [0,1 PEP > An 


maps the losses from previous rounds to the decider’s current mixed strategy over 
actions. 


18.5.1. Minimax regret is attained in {0,1} losses. Given {% : 1 <i < 


n, 1 < t < T}, denote by {®t} the sequence of independent {0, 1}-valued random 
variables with 
) [è] = tt. 


THEOREM 18.5.1 (“Replacing losses by coin tosses”). For any decision 
strategy D that is defined only for {0,1} losses, a corresponding decision strategy 
Dio,1) 1s defined as follows: For each t, given {eye applying D to {Oy yields 
p’. Use p' = E [p'‘] at time t in Dio,1]- Then 


t [Rr P, {@"})] > Rr Opa 1e). (18.14) 


PROOF. Denote by E; [-] an expectation given the history prior to time t. We 
have 


om [p 2] — p'-E, [ë] — pt. et 
since p* is determined by {ê ER Taking an expectation of both sides, 
J [p ë| =p- t, 


i.e., the randomization does not change the expected loss of the decider. However, 
it may reduce the expected loss of the best expert since 


J [min ir] <minE [êr] = min IF. 
t u t 


Thus, 


DE |p] - E [min £7] > Spt - et - min EF, 
7 z l 


t 
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yielding |(18.14) 


REMARK 18.5.2. From an algorithmic perspective, there is no need to compute 
p’ = E [Pf] in order to implement Djo,)’s decision at time t. Rather, Dio.) can 
simply use p* at step t. 


COROLLARY 18.5.3. Minimax regret is attained in {0,1} losses; i.e., 


i Rr(D, {£"}) = Rr(Doo1, {8}. 18.15 
ae 7(D, {e}) on ee tT (Dory {€°}) (18.15) 


Proor. Given D, construct Djo,1] as in the lemma. Since 


e ton} Rr(D, {é}) 2 ) [rRr(P, {é'})] 2 Rr(Doo,1); {C} vee € (0, i 
tE{O,1}rT 


we have 


Rr(D, {e'}) > R e 
Ao r(D,{6}) > ee t(Doo,1), {E}, 


and|(18.15)| follows. 


18.5.2. Optimal adversary strategy. The adaptive decision-making prob- 
lem can be seen as a finite two-player zero-sum game as follows. The pure strate- 
gie?| | of the adversary (player I) are the loss vectors {¢'} € {0,1}"7. An adversary 
strategy £ is a probability distribution over loss sequences £ := {£°}7_,. The pure 
strategies for the decider (player II) are a := {a;}7_,, where a, : {0, Lyne) — [n] 
maps losses in the first t — 1 steps to an action in the tt? step. A decider strategy 
D is a probability distribution over as strategies. Let 


R7(a, £) ie —min 7G. 
t 


By the Minimax Theorem 


min max i [R3 (a, £)] = max min E [R} (a, £)]. 


Observe that when taking an Pai over a ~ D, with £ fixed, we get 


i [R5 (a, £)] => min t= Rr(D, £), 


where pt = p‘({€°}'—}) is the marginal distribution over actions taken by the 


decider at time t and D is the sequence of functions {p'}7_ 


Reducing to balanced adversary strategies 


We say that an adversary strategy £ is balanced if, for every time t, condi- 
tioned on the history of losses through time t — 1, the expected loss of each expert 
is the same, i.e., for all pairs of actions i and j, we have E; [É] = E; [és]. 

For fixed a and £ ~ £, write 

R7(a,L) := E[R7(a, £)]. 


PROPOSITION 18.5.4. Let R7(L) := ming R7(a,L). Then maxs R7(L) is at- 
tained in balanced strategies. 


2 These are oblivious strategies, which do not depend on previous decider actions. See the 
chapter notes for a discussion of adaptive, i.e., nonoblivious, adversary strategies. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


18.5. ADAPTIVE DECISION MAKING AS A ZERO-SUM GAME* 325 


PROOF. Clearly ming R*(a,£) is achieved by choosing, at each time step t, 
the action which has the smallest expected loss in that step, given the history of 
losses. We claim that for every £ that is not balanced at time t for some history, 
there is an alternative strategy £ that is balanced at t and has 


RI (L) > Rr(L). 


Construct such a £ as follows: Pick {t} according to £. Let & = £5 for all s  t. 
At time ¢, define for all i 
Č = £6, 


where 6; is independent of the loss sequence, 


min; E; [e] 
oe [6] 


(Recall that E; [-] denotes the expectation given the history prior to time t.) This 
change ensures that at time t, all experts have conditional expected loss equal 
to minj E; [e]. The best-response strategy of the decider at time t remains a best 
response. The same would hold at future times if the decider was given ¢} as well as 
Ë. Since he has less information, his expected loss at future times cannot decrease. 
Moreover, the benchmark loss min; E [LF] is weakly reduced. 


6,€ {0,1}, and E[6,]= 


Against a balanced adversary, the sequence of actions of the decider is irrele- 
vant. Taking D to be the uniform distribution over actions (i.e., pt = 1/n for each 
i and t), we have 


1 n 
Ri(L) =E È SOL - min z| , (18.16) 


i=l 


where L? = 57 &. 


18.5.3. The case of two actions. For n = 2, equation|(18.16)|reduces to 


1 1 
= Loot +13) — min(Z? L) = Az? 14), 
Write Xt = # — & so that Xt € {—1,0,1} with E; [Xt] = 0. To maximizd] 
a 4 X;| = 2R*(L), we let Xt € {—1,1} so that S; := $i; Xx is a simple 
random walk. In other words, £° is i.i.d., equally likely to be (1,0) or (0,1). Thus, 
by the Central Limit Theorem, we have 


Rr(L) 


Rr = R}(£) = 5E|Sr| = 5 VTE|Z|(1 + 0(1)) 


with Z ~ N(0,1), so E|Z| = /2. 


Optimal decider 


3 Let {Sp} be a simple random walk on Z. Condition on |{t < T : Xt 4 O}| 
m. Then it suffices to show that E[|Sm|] < E[|Sr|], which holds since E [Sml] 
E [tj € {0,...,m—1} | 3; = 0]. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


326 18. ADAPTIVE DECISION MAKING 


To find the optimal D, we consider an initial integer gap h > 0 between the 
actions and define 


rr(h) = min max E [L] —min{LT +h, LF}, 


where LE = Si £ . pt. By the Minimax Theorem, 


rr(h) = max min [LB — min{ LT +h, LZ }]. 


As in the discussion above, the optimal adversary is balanced, so we have 


1 : 
rp(h) = e RX a y Fai + LI) —min(L? +h, 1) 
1 
j [5 (ef +r- -1)]. 


Again, the adversary’s optimal strategy is to select @ i.i.d., equally likely to be 
(1,0) or (0,1), so with {S;} denoting a simple random walk, 


= max 
L£ balanced 


rr(h) = 5E [|h + Sr| — Al. (18.17) 


Fix an optimal strategy D for the decider. To emphasize the dependence on T 
and h, write wr(h) = pt, the probability that D assigns to action 1 in the first step. 
If the adversary selects the loss vector £1 = (1,0), then the gap between the losses 
of the actions becomes h + 1, the decider’s expected loss in this step is wr(h), and 


min(L? + h, LI) = min( Lt -*+h+1,27-*), 
where LT! refers to losses in the time interval [2, T]. Thus, if the optimal adversary 
assigns £! = (1,0) positive probability, then 
rp(h) = rp_1(h + 1) + wr(h). 


On the other hand, if the adversary selects €! = (0,1), then the gap between 
the losses becomes h — 1, the decider’s expected loss is 1 — wr(h), and 


min(L? + h, LT) = min(L7—* +h, bo + +1) =14+ min(Lf-++A-1,£7-*). 
Thus, if the optimal adversary assigns £! = (0,1) positive probability, then 
rr(h) =rr-i(h— 1) -14+1-%¢r(h) =rr-i(h — 1) — br (A). 
To maximize regret, the adversary will select £! so that 
rp(h) = max(rr-i(h + 1) + br(h), rr-i(h — 1) = dr(h)). 
At optimality, the decider will choose wr(h) to equalize these cost] in which case 


r _ rp_i(h — 1) 5 rp_1(h + 1) 


Thus by |(18.17) 


sir- 1+ Sral—|h+1+ Spal +2], 


4 This is possible since 0 < rp_1(h — 1) — rr—ı(h +1) < 2 by definition. 
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Since for m integer 


—2 ifm>0, 
Įm- 1|- |m+1|=4 0 ifm=0, 
2 ifm<dO, 


we conclude that 


i [2 i Lg sp_1+h=0} +4- Lisr-1+h<0}] 
4 


1 
zF [Sr-1 +h = 0] +P [Sr +h< 0]. (18.18) 


Yr(h) = 


II 


In other words, Yr(h) is the probability that the action currently lagging by h will 
be the leader in T — 1 steps. 


THEOREM 18.5.5. For n = 2, with losses in {0,1}, the minimax optimal regret 


Rr = ea + o(1)). 


The optimal adversary strategy is to take & i.i.d., equally likely to be (1,0) or 
(0, 1). 

The optimal decision strategy {p'}7_, is determined as follows: First, p! = 
(1/2,1/2). Fort € [1,7 — 1], let i, be an action with the smaller cumulative loss 
at time t, so that Li. = min(L{,L5). Also, let hy = |Li — L5|. At time t +1, 
take the action i, with probability a = 1-—wr_i(ht) and other action 3 — i; with 


probability pee = Yrlhi). 


1s 


Let ® denote the standard normal distribution function. By the Central Limit 
Theorem, 


wr(h) = ®(—h/VT)(1 + 0(1)) as T > oo, 


so the optimal decision algorithm is easy to implement. 


18.5.4. Adaptive versus oblivious adversaries. In the preceding, we as- 
sumed the adversary is oblivious, i.e., selects the loss vectors & = #(D) indepen- 
dently of the actions of the decider. (He can still use the mixed strategy D, but 
not the actual random choices.) 

A more powerful adversary is adaptive, i.e., can select loss vectors & that 
depend on D and also on the previous actions a1, @2,...,@;_1. With the (standard) 
definition of regret that we used, for every D, adaptive adversaries cause the same 
worst-case regret as oblivious ones; both regrets simply equal the maximum over 
individual loss sequences max Rr(D, £). For this reason, it is often noted that low 
regret algorithms like the Multiplicative Weights Algorithm work against adaptive 
adversaries as well as against oblivious ones. 

Against adaptive adversaries, the notion of regret we use here is 


T 
Rr(D,£) =E | LF — min X` &i(ai,...,a2-1) |. (18.19) 


t=1 
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An alternative known as policy regret is 


T 
Rr(D,£) =E |L -min 4i...) (18.20) 
t=1 


The notion of regret in|(18.19)|is useful in the setting of learning from expert advice 
where it measures the performance of the decider relative to the performance of the 
best expert. Next we give some examples where policy regret is more appropriate. 


(1) Suppose the actions of the decider lead to big and equal losses to all the 
experts and hence also to him, while if he consistently followed the advice 
of expert 1, then he would have 0 loss. The usual regret for his destructive 
actions will be 0, but the policy regret will be large. Such a scenario could 
arise, for example, if the decider is a large investor in the stock market, 
whose actions affect the prices of various stocks. Another example is when 
the decider launches a military campaign causing huge losses all around. 

(2) Let £ = {€°} be any oblivious loss sequence. Imposing a switching cost 
can be modeled as an adaptive adversary £ defined by 


Ë = É + Vga foro} Vit. 
The usual regret will ignore the switching cost, i.e., 
Rr(D, £) = Rr(D, Č) VD, 


but policy regret will take it into account. 

For example, if £6 = 1gi=t moa 2} and there are two actions, then MW 
will eventually choose between the two actions with nearly equal proba- 
bility, so its expected cost will be T/2 + O(1) plus the expected number 
of switches, which is also T/2. The cost LT will also be T/2 plus the 
expected number of switches that MW does. Thus R7(MW, Ê) = O(1). 
However, with policy regret, the benchmark min; > É(i,..., i) = T/2, 
so Rr(MW, Ê) = T/2 + O(1). 


(3) Consider a decider playing repeated Prisoner’s Dilemma (as discussed in 


Example 4.1.1|and|§6.4) for T rounds. 


player IT 


cooperate defect 


cooperate | (—1,—1) (—10,0) 


player I 


defect | (0,—10) (—8,—8) 


Suppose that the loss sequence £ is defined by an opponent playing Tit- 
for-Tat[}| In this case, defining ag = C, the loss of action i at time t 


5 Recall|Definition 6.4.2) Tit-for-Tat is the strategy in which the player cooperates in round 


1 and in every round thereafter plays the strategy his opponent played in the previous round. 
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is: 
1 (a¢-1, 1) = (C, C), 
t S 0 (@¢=1,7) = (C, D), 
Gla) = $ 19 (a-1,i) = (D, C), 


8  (a-1,i) = (D, D). 
Since it is a dominant strategy to defect in Prisoner’s Dilemma, 


T T 
Liefect >L 


cooperate‘ 


(This holds for any opponent, not just Tit-for-Tat.) Thus, for any decider 
strategy, 


E [Rr(D, £)] =E XO lga =0} Cees +2lfas1=D}) : 
t 


Minimizing regret will lead the decider towards defecting every round and 
incurring a loss of 8(T — 1). However, minimizing policy regret will lead 
the decider to cooperate for T — 1 rounds, yielding a loss of T — 1. 


Notes 


A pioneering paper |Han57| on adaptive decision-making was written by Hannan in 
1957. (Algorithms with sublinear regret are also known as Hannan consistent.) Adaptive 
decision making received renewed attention starting in the 1990s due to its applications 
in machine learning and algorithms. For in-depth expositions of this topic, see the book 
by Cesa-Bianchi and Lugosi and the surveys by Blum and Mansour (Chapter 4 
of [Nis07]) and Arora, Hazan, and Kale [AHK12al. 


James Hannan 


The binary prediction problem with a perfect expert is folklore and can be found, e.g., 
in [CBLO06|; the sharp bound in|Theorem 18.1.4]is new, to the best of our knowledge. The 
Weighted Majority Algorithm and the Multiplicative Weights Algorithm from|§18.2.lJ/and 


§18.3.2|are due to Littlestone and Warmuth |LW89}|LW94]. A suite of decision-making al- 
gorithms closely related to the Multiplicative Weights Algorithm [FS97 CBFH*97 


go under different names including Exponential Weights, Hedge, etc. The sharp analy- 
sis in [§18.3.2] is due to Cesa-Bianchi et al. CEFN OT. The use of the Multiplicative 
Weights algorithm to play zero-sum games, discussed in is due to Grigoriadis and 
Khachiyan and Freund and Schapire [FS97]. is due to [Cov65]. 
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Our exposition follows [GPS14], where optimal strategies for three experts are also de- 
termined. The notion of policy regret discussed in [§18.5.4] is due to Arora, Dekel, and 
Tewari [ADT12]. Example (3) of is discussed in [CBLO6]. 

There are several important extensions of the material in this chapter that we have 
not addressed: In the multiarmed bandit problenf}| [BCB12], the decider learns his loss in 
each round but does not learn the losses of actions he did not choose. Surprisingly, even 
with such limited feedback, the decider can achieve regret O(./Tn log n) against adaptive 
adversaries and O(VTn) against oblivious adversaries [ABIO]. 

Instead of comparing his loss to that of the best action, the decider could examine 
how his loss would change had he swapped certain actions for others, say, each occurrence 
of action i by f(z). Choosing the optimal f(-) in hindsight defines swap regret. Hart 
and Mas-Colell showed how to achieve sublinear swap regret using Blackwell’s 
Approachability Theorem [Bla56]. Blum and Mansour [BM05] showed how to achieve 
sublinear swap regret via a reduction to algorithms that achieve sublinear regret. When 
players playing a game use sublinear swap regret algorithms, the empirical distributions 
of play converge to a correlated equilibrium. 

The adaptive learning problem of [§18.3.2| can be viewed as online optimization of 
linear functions. An important generalization is online convex optimization. See, e.g., the 
survey by Shalev-Shwartz [SSII]. 


b ewe 
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In 1951, Brown proposed a simple strategy, known as Fictitious Play or Follow 
the Leader, for the repeated play of a two-person zero-sum game: At time t, each player 
plays a best response to the empirical distribution of play by the opponent in the first 
t — 1 rounds. Robinson showed that if both players use fictitious play, their 
empirical distributions converge to optimal strategies. However, as discussed in 
this strategy does not achieve sublinear regret in the setting of binary prediction with 
expert advice. Hannan analyzed a variant, now known as Follow the Perturbed 
Leader, in which each action/expert is initially given a random loss and then henceforth the 
leader is followed. Using a different perturbation, Kalai and Vempala showed that 
this algorithm obtains essentially the same regret bounds as the Multiplicative Weights 
Algorithm. ([KV05] also handles switching costs.) 

[Exercise 18.4]is due to Avrim Blum. 


Exercises 


18.1. Consider the setting of |§18.2| and suppose that u, (respectively d+) is the 
number of leaders voting up (respectively down) at time t. Consider a 
trader algorithm A that decides up or down at time t with probability pz, 


6 This problem is named after slot machines in Las Vegas known as one-armed bandits. 
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where p; = pz(uz, d+). Then there is an adversary strategy that ensures that 
any such trader algorithm A incurs an expected loss of at least |log,(n)|L7. 


Hint: Adapt the adversary strategy in|Proposition 18.1.3} ensuring that no 


expert incurs more than one mistake during Sp. Repeat. 


18.2. In the setting of learning with n experts, at least one of them perfect, 
suppose each day there are q > 2, rather than two, possibilities to choose 
between. Observe that the adversary can still force at least |log,(n) | /2 
mistakes in expectation. Then show that the decider can still guarantee 
that the expected number of mistakes is at most log,(7). 

Hint: Follow the majority opinion among the £ current leaders (if such 
a majority exists) with probability p(x) (where the majority has size x£, as 
in the binary case); with the remaining probability (which is 1 if there is no 
majority), follow a uniformly chosen random leader (not in the majority). 


18.3. | Show that there are cases where any deterministic algorithm in the experts 
setting makes at least twice as many mistakes as the best expert; i.e., for 
some T, L(T) > 2L7. 


18.4. Consider the following variation on the Weighted Majority Algorithm: 
On each day t, associate a weight wt with each expert i. 
Initially, when t = 1, set wł = 1 for all i. 


Each day t, follow the weighted majority opinion: Let U+ be the 
set of experts predicting up on day t, and D; the set predicting 
down. Predict “up” on day t if Wy(t) = Vjey, wi = Wp(t) = 
Diep, Wi and “down” otherwise. 

On day t+1, for each i such that (a) expert i predicted incorrectly 
on day t and (b) wt > qd), wi, set 


1 
wit = Su. (18.1) 
Show that the number of mistakes made by the algorithm during every 
contiguous subsequence of days, say 7,7 +1,...,7 +r, is O(m + logn), 


where m is the fewest number of mistakes made by any expert on days 
7T,74+1,...,74+7. 


18.5. Consider the sequential adaptive decision-making setting of |§18.3} with 
unknown time horizon T. Adapt the Multiplicative Weights Algorithm 


by changing the parameter e over time to a new value at t = 2/ for 
j = 0,1,2,.... (This is a “doubling trick”.) Show that the sequence of 
ce values can be chosen so that for every action i, 
= 2 1 
T” <IT+ v =T logn. 
V2-1V 2 
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18.6. Generalize the results of[§18.5.4|for n = 2 to the case where the time horizon 
T is geometric with parameter 6; i.e., the process stops with probability ô 
in every round: 

e Determine the minimax optimal adversary and the minimax regret. 
e Determine the minimax optimal decision algorithm. 


S 18.7. 
(a) For Y a normal N(0,1) random variable, show that 


ie) 


y 


y2 
eos try) < pP Y >y <e 27 as you. 
(b) Suppose that Y1,..., Yn are i.i.d. N(0,1) random variables. Show that 


max g = y2logn (1 +ọ0o(1)) as n—> o. (18.2) 


|e 


18.8. Consider an adaptive adversary with bounded memory; that is, 
É = É (at-m, sey at—1) 


for constant m. Suppose that a decider divides time into blocks of length 
b and uses a fixed action, determined by the Multiplicative Weights Algo- 
rithm in each block. Show that the policy regret of this decider strategy is 
O(/Tblogn + Tm/b). Then optimize over b. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


APPENDIX A 
Linear programming 


A.1. The Minimax Theorem and linear programming 


Suppose that we want to determine if player I in a two-person zero-sum game 
with m x n payoff matrix A = (a;;) can guarantee an expected gain of at least V. It 
suffices for her to find a mixed strategy x which guarantees her an expected gain of 
at least V for each possible pure strategy j player II might play. These conditions 
are captured by the following system of inequalities: 


L1A1j + L2QA23 + +++ + Lmamj >V forl< gon. 
In matrix-vector notation, this system of inequalities becomes: 
x’ A> Vel, 


where e is an all-1’s vector. (Its length will be clear from context.) 
Thus, to maximize her guaranteed expected gain, player I should find an x € 
R” and a V €R that 


maximize V 

subject to a? A> Vef, (A.1) 
1<i<m 
xi >0 for alll <i<m. 


This is an example of a linear programming problem (LP). Linear program- 
ming is the process of minimizing or maximizing a linear function of a finite set of 
real-valued variables, subject to linear equality and inequality constraints on those 
variables. In the linear program (Ai), the variables are V and 21,...,2m. 

The problem of finding the optimal strategy for player II can similarly be 
formulated as a linear program (LP): 


minimize V 
subject to Ay <V,e (A.2) 
> wah 
l<jn 
yj > O for alll <j <n. 
Many fundamental optimization problems in engineering, economics, and trans- 
portation can be formulated as linear programs. A typical example is planning 


airline routes. Conveniently, there are well-known efficient (polynomial time) algo- 
rithms for solving linear programs (see notes) and, thus, we can use these algorithms 


333 
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to solve for optimal strategies in large zero-sum games. In the rest of this appendix, 
we give a brief introduction to the theory of linear programming. 


A.2. Linear programming basics 


EXAMPLE A.2.1. (The protein problem). Consider the dilemma faced by a 
student-athlete interested in maximizing her protein consumption, while consuming 
no more than 5 units of fat per day and spending no more than $6 a day on protein. 
She considers two alternatives: steak, which costs $4 per pound and contains 2 units 
of protein and 1 unit of fat per pound; and peanut butter, which costs $1 per pound 
and contains 1 unit of protein and 2 units of fat per pound. 


Let xı be the number of pounds of steak she buys per day, and let x2 be the 
number of pounds of peanut butter she buys per day. Then her goal is to 


max 271+ 22 
subject to 4dr, + 22 < 6, (A.3) 
zı + 2x9 <5, 


1,2 > 0. 


FIGURE A.1. The purple region in the above graphs is the feasible set for 
the linear program (A.3). The largest c for which the line 241 + v2 = c 
intersects the feasible set is c = 4. The red arrow from the origin on the right 
is perpendicular to all these lines. 


The objective function of a linear program is the linear function being op- 
timized, in this case 2%, + £2. The feasible set of a linear program is the set 
of feasible vectors that satisfy the constraints of the program, in this case, all 
nonnegative vectors (x1, £2) such that 4x; + z2 < 6 and zı + 2a < 5. 

The left-hand side of [Figure A.I] shows this set. The question then becomes: 
which point in this feasible set maximizes 27, + z2? For (A.3), this point is 
(%1,%2) = (1,2), and at this point 2%, + v2 = 4. Thus, the optimal solution 
to the linear program is 4. 
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A.2.1. Linear programming duality. The Minimax Theorem that we proved 
earlier shows that for any zero-sum game, the two linear programs and 
have the same optimal value V*. This is a special case of the cornerstone of linear 
programming, the Duality Theorem (Theorem A.2.2]in the next section). 

To motivate this theorem, let’s consider the LP from the previous section more 
analytically. The first constraint of immediately implies that the objective 
function is upper bounded by 6 on the feasible set. Doubling the second constraint 
gives a worse bound of 10. But combining them, we can do better. 

Multiplying the first constraint by yı > 0, the second by y2 > 0, and adding 
the results yields 


yi (4a1 + z2) + yo(a1 + 2x2) < 6y1 + 5y2. (A.4) 
The left-hand side of dominates the objective function 27; + x2 as long as 
4yı + yo È 2, (A.5) 
yi + 2y2 = 1, 
Yi, Y2 = 0. 


Thus, for any (y1, 2) that is feasible for (A.5), we have 271 + x2 < 6yı + 5y2 for 
all feasible (x1, £2). The best upper bound we can obtain this way on the optimal 
value of (A.3) is the solution to the linear program 


min 6y; + 5y2 subject to (A.5). (A.6) 


This minimization problem is called the dual of LP (A.3). Observing that (y1, y2) = 
(3/7,2/7) is feasible for LP with objective value 4, we can conclude that 
(£1, £2) = (1,2), which attains objective value 4 for the original problem, must be 
optimal. 


A.2.2. Duality, more formally. We say that a maximization linear program 
is in standard forml] if it can be written as 
max c! x 
subject to 
Ax <b,{’ 


x >0 


(P) 


where A € R™*", x € R”, c € R”, and b € R™. We will call such a linear program 
(P) the primal LP. We say the primal LP is feasible if the feasible set 


F(P) :={x| Ax <b, x>0} 


is nonempty. 
As in the example at the beginning of this section, if y > 0 in R” satisfies 
yTA > c7, then 
Yx € F(P), yTb > yT Ax > cx. (A.7) 
This motivates the general definition of the dual LP 


Nt is a simple exercise to convert from nonstandard form (such as a game LP) to standard 
form. For example, an equality constraint such as a1£ı + a2%2 +---+@n2%n = b can be converted 
to two inequalities: a1£1ı +a2%2+---+@4n@n > band ajx21+a2%24+-+-+anan < b. A > inequality 
can be converted to a < inequality and vice versa by multiplying by —1. A variable x that is 


not constrained to be nonnegative, can be replaced by the difference x’ — x” of two nonnegative 
variables, and so on. 
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min b! y 
subject to 
y "A> c", 
y 2 0, 
where y € R™. As with the primal LP, we say the dual LP is feasible if the set 
F(D) := {y | y7A>c7; y>0} 


(D) 


is nonempty. 
It is easy to check that the dual of the dual LP is the primal LPP] 


THEOREM A.2.2 (The Duality Theorem of Linear Programming). Sup- 
pose that A E€ R™*", x,c E€ R”, andy, b E€ R™. If F(P) and F(D) are nonempty, 
then: 

e by > cx for all x € F(P) andy € F(D). (This is called weak 
duality.) 

e (P) has an optimal solution x*, (D) has an optimal solution y*, and 
cTx* = bľy*. 


REMARK A.2.3. The proof of the Duality Theorem is similar to the proof of 
the Minimax Theorem. This is not accidental; see the chapter notes. 


COROLLARY A.2.4 (Complementary Slackness). Let x* be feasible for (P) 
and let y* be feasible for (D). Then the following two statements are equivalent: 
(1) x* is optimal for (P) and y* is optimal for (D). 
(2) For each i such that X i<j<n Vij; < bi we have y = 0, and for each j 


such that cj < Vicicm Y tij we have xj = 0. 


PRooF. Feasibility of y* and x* implies that 
Dons DaD uau- Ct Daves <b A8 
J j i a J i 


By the Duality Theorem, optimality of x* and y* is equivalent to having equal- 
ity hold throughout |(A.8)} Moreover, by feasibility, for each j we have cjxj < 
x* X i ys aij, and for each i we have y} 3; air, < by. Thus, equality holds in 


J 
(A.8) if and only if (2) holds. 


A.2.3. An interpretation of a primal/dual pair. Consider an advertiser 
about to purchase advertising space in a set of n newspapers, and suppose that c; 
is the price of placing an ad in newspaper j. The advertiser is targeting m different 
types of users, for example, based on geographic location, interests, age, and gender, 
and wants to ensure that, on average, b; users of type i will see the ad. Denote 
by aij the number of type 7 users expected to see each ad in newspaper j. The 
advertiser must decide how many ads to place in each newspaper in order to meet 
his various demographic targets at minimum cost. To this end, the advertiser solves 
the following linear program, where x; is the number of ad slots from newspaper j 
that she will purchase: 


2A standard form minimization LP can be converted to a maximization LP (and vice versa) 
by observing that minimizing by is the same as maximizing —b’ y, and > inequalities can be 
converted to < inequalities by multiplying the inequality by —1. 
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min 5 CjTj 
1<j<n 
subject to 5 Qaij£j > b; for alll <i <m, (A.9) 
1<j<n 
Bi 89 os, Cy — O. 


The dual program is 


max 5 biyi 
1<i<m 
subject to 5 Yiaij < cj forali<j<n, (A.10) 
1<i<m 


Yi Y2- -3 Ym = 0. 


This dual program has a nice interpretation: Consider an online advertising 
exchange that matches advertisers with display ad slots. The exchange needs to 
determine y;, how much to charge the advertiser for each impression (displayed ad) 
shown to a user of type 7. Observing that y;a;; is the expected cost of reaching the 
same number of type i users online that would be reached by placing a single ad in 
newspaper j, we see that if the prices y; are set so that X j<i<m Yidiy < cj, then 
the advertiser can switch from advertising in newspaper j to advertising online, 
reaching the same combination of user types without increasing her cost. If the 
advertiser switches entirely from advertising in newspapers to advertising online, 
the exchange’s revenue will be 

5 biyi. 


1<i<m 

The Duality Theorem says that the exchange can price the impressions so as 
to satisfy and incentivize the advertiser to switch, while still ensuring that 
its revenue ) >, biy; (almost) matches the total revenue of the newspapers. 

Moreover, [Corollary A.2.4] implies that if inequality is not tight for user 
type i in the optimal solution of the primal, i.e., Pig air; > bi, then y; = 0 
in the optimal solution of the dual. In other words, if the optimal combination 
of ads the advertiser buys from the newspapers results in the advertisement being 
shown to more users of type i than necessary, then in the optimal pricing for the 
exchange, impressions shown to users of type i will be provided to the advertiser for 
free. This means that the exchange concentrates its fixed total charges on the user 
types which correspond to tight constraints in the primal. Thus, the advertiser can 
switch to advertising exclusively on the exchange without paying more and without 
sacrificing any of the “bonus” advertising the newspapers were providing. 

(The fact that some impressions are free may seem counterintuitive since pro- 
viding ads has a cost, but it is a consequence of the assumption that the exchange 
maximizes revenue from this advertiser. In reality, the exchange would maximize 
profit, and these goals are equivalent only when the cost of production is zero.) 


Finally, the other consequence of {Corollary A.2.4}is that if x; > 0, i.e., some 


ads were purchased from newspaper j, then the corresponding dual constraint must 
be tight, i.e., zien YiGij = Cj. 
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A.2.4. The proof of the Duality Theorem*. Weak duality follows from 
(A.7). We complete the proof of the Duality Theorem in two steps. First, we will use 
the Separating Hyperplane Theorem to show that sup,¢ Fp) cTx = infyeF(D) b’y, 
and then we will show that the sup and inf above are attained. For the first step, 
we will need the following lemma. 


LEMMA A.2.5. Let A E€ R™*”, and let S = {Ax | x > 0}. Then S is closed. 


PROOF. If the columns of A are linearly independent, then A : R” WH W = 
A(R”) is invertible, so there is a linear inverse L : W ++ R”, from which 


{Ax | x > 0} = L~{x € R” | x > 0}, 


which is closed by continuity of L. 
Otherwise, if the columns A“ of A are dependent, then we claim that 
{Ax | x > 0} = UZ 44 |z>0, 2% =ü}, 
k=1 j=1 
To see this, observe that there is A # O such that AA = 0. Without loss of 
generality, Aj < 0 for some j; otherwise, negate A. Given x € R”, x > 0, find 
the largest t > 0 such that x + tA > 0. For this t, some x, + tåg = 0. Thus, 
Ax = A(x+tA) € 14 zj;AM | z>0, zk =O}. 
Using induction on n, we see that {Ax | x > 0} is the union of a finite number 
of closed sets, which is closed. 


Next, we establish the following “alternative” theorem known as Farkas’ Lemma, 
from which the proof of duality will follow. 


LEMMA A.2.6 (Farkas’ Lemma - 2 versions). Let A € R™*” and b € R”. 
Then 


(1) Exactly one of the following holds: 
(a) There exists x € R” such that Ax = b and x > 0 or 
(b) there exists y € R™ such that yT A > 0 and yTb < 0. 
(2) Exactly one of the following holds: 
(a) There exists x € R” such that Ax < b and x > 0 or 
(b) there exists y € R™ such that yT A > 0, yTb <0 and y > 0. 


PROOF. Part (1): (See |Figure A.2| for a visualization of Part (1).) We first 
show by contradiction that (a) and (b) can’t hold simultaneously: Suppose that x 
satisfies (a) and y satisfies (b). Then 


0 > yTb = yT Ax > 0, 


a contradiction. 

We next show that if (a) is infeasible, then (b) is feasible: Let S = {Ax | x > 0}. 
Then S is convex, and by [Lemma A.2.5] it is closed. In addition, b ¢ S since (a) is 
infeasible. Therefore, by the Separating Hyperplane Theorem, there is a hyperplane 
that separates b from S; i.e., y’b < a and y"z > a for all z € S. Since 0 is in 
S, a < 0 and therefore yTb < 0. Moreover, all entries of yT A are nonnegative: If 
not, say the kt? entry is negative, then by taking zp arbitrarily large and x; = 0 for 
i Æ k, the inequality yT Ax > a would be violated for some x > 0. Thus, it must 
be that yTA > 0. 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


A.2. LINEAR PROGRAMMING BASICS 339 


FIGURE A.2. The figure illustrates the two cases (1)(a) and (1)(b) of Farkas’ 
Lemma. The shaded region represents all positive combinations of the columns 
of A. 


Part (2): We apply Part (1) to an equivalent pair of systems. The existence of 
an x € R” such that Ax < b and x > 0 is equivalent to the existence of an x > 0 
in R” and v > 0 in R™ such that 


Ax+Iv=b, 


where J is the m x m identity matrix. Applying Part (1) to this system means that 
either it is feasible or there is a y € R™ such that 


y'A > 0, 
Iy = 0, 
yTb < 0, 


which is precisely equivalent to (b). 


COROLLARY A.2.7. Under the assumptions of|Theorem A.2.2 


sup c?x= inf b’y. 
xEF(P) yeF(D) 

PROOF. Suppose that supyez(p)¢"x < y. Then {Ax < b; -c?x < -7; 

x > 0} is infeasible, and therefore by Part (2) of Farkas’ Lemma, there is (y, A) > 0 

in R™+? such that y?A— Ac? > 0 and yb — Ay < 0. Since there is an x € F(P), 
we have 

0 < (yTA — Ac?)x < yTb — Ac? x 
and therefore \ > 0. We conclude that y/A is feasible for (D), with objective value 
strictly less than y. 


To complete the proof of the Duality Theorem, we need to show that the sup 
and inf in Corollary [A.2.7]are attained. This will follow from the next theorem. 


THEOREM A.2.8. Let A € R”*” andb € R™. Denote 
F(P_) = {x € R” |x > 0 and Ax = b}. 
(i) If F(P=) #9 and sup{c?x|x € F(P_)} < œ, then this sup is attained. 
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(ii) If F(P) AO and sup{c?x|x € F(P)} < œœ, then this sup is attained. 


The proof of (i) will show that the sup is attained at one of a distinguished, 
finite set of points in F(P=) known as extreme points or vertices. 


DEFINITION A.2.9. 
(1) Let S be a convex set. A point x € S is an extreme point of S if 
whenever x = au + (1—a)v with u,v € S and 0 < a < 1, we must have 
xX =U=V. 
(2) If S is the feasible set of a linear program, then S is convex; an extreme 
point of S is called a vertex. 


We will need the following lemma. 


LEMMA A.2.10. Let x € F(P_). Then x is a verter of F(P_) if and only if 
the columns {AY | x; > 0} are linearly independent. 


PROOF. Suppose x is not extreme; i.e., x = av + (1 — a)w, where v Æ w, 
0<a<1,and v,w € F(P_). Thus, A(v — w) = 0 and v — w £0. Observe that 
vj = wj = 0 for all j ¢ S, where S = {j | x; > O}, since vj, w; > 0. We conclude 
that the columns {A\) | x; > 0} are linearly dependent. 

For the other direction, suppose that the vectors {A | x; > 0} are linearly 
dependent. Then there is w Æ 0 such that Aw = 0 and w; = 0 for all j ¢ S. Then 
for e sufficiently small x + ew € F(P_), and therefore x is not extreme. 


LEMMA A.2.11. Suppose that sup{c?x|x € F(P_)} < œ. Then for any point 


x € F(P_), there is a verter x € F(P=) with c7X > cx. 


PROOF. We show that if x is not a vertex, then there exists x’ € F(P_) with 
a strictly larger number of zero entries than x, such that cTx’ > c?x. This step 
can be applied only a finite number of times before we reach a vertex that satisfies 
the conditions of the lemma. 

Let S = {j|a; > 0}. If x is not a vertex, then the columns {A\)|j € S} are 
linearly dependent and there is a vector A Æ 0 such that >> j dj AY ) = AX = 0 with 
Aj =0 for j ZS. 

Without loss of generality, c’A > 0 (if not, negate A). Consider the vector 
X(t) =x+tA. For t > 0, we have c7 X(t) > cx and AX(t) = b. For t sufficiently 
small, X(t) is also nonnegative and thus feasible. 

If there is 7 € S such that A; < 0, then there is a positive t such that x(t) is 
feasible with strictly more zeros than x, so we can take x’ = x(t). 

The same conclusion holds if A; > 0 for all j and cTA = 0; simply negate » 
and apply the previous argument. 

To complete the argument, we show that the previous two cases are exhaustive: 
if A; > 0 for all j and c7A > 0, then X(t) > 0 for all t > 0 and limy_,~ c7 X(t) = 00, 
contradicting the assumption that the objective value is bounded on F(P- ). 


PROOF OF Part (i): Lemma|A.2.11}shows that if the linear 
program 
maximize cx subject to x € F(P_) 
is feasible and bounded, then for every feasible solution, there is a vertex with 
at least that objective value. Thus, we can search for the optimum of the linear 
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program by considering only vertices of F(P—). Since there are only finitely many, 
the optimum is attained. 

Part (ii): We apply the reduction from Part (2) of the Farkas’ Lemma to show 
that the linear program (P) is equivalent to a program of the type considered in 
Part (1) with a matrix (A; J) in place of A. 


A.3. Notes 


There are many books on linear programming; for an introduction, see [MG07]. As 
mentioned in the notes to[Chapter 2} linear programming duality is due to von Neumann. 

Linear programming problems were first formulated by Fourier in 1827. In 1939, 
Kantorovich published a book on applications of linear programming and sug- 
gested algorithms for their solution. He shared the 1975 Nobel Prize in Economics with 
T. Koopmans “for their contributions to the theory of optimum allocation of resources.” 

Dantzig’s development of the Simplex Algorithm in 1947 (see [Dan5lal) was sig- 
nificant as, for many real-world linear programming problems, it was computationally 
efficient. Dantzig was largely motivated by the need to solve planning problems for the 
air force. See for the history of linear programming up to 1982. In 1970, Klee 
and Minty showed that the Simplex Algorithm could require exponential time in 
the worst case. Leonid Khachiyan developed the first polynomial time algorithm 
in 1979; The New York Times referred to this paper as “the mathematical Sputnik.” In 
1984, Narendra Karmarkar introduced a more efficient algorithm using interior 
point methods. These are just a few of the high points in the extensive research literature 
on linear programming. 

We showed that the Minimax Theorem follows from linear programming duality. In 


fact, the converse also holds; see, e.g., |Dan51b} |Ad113}. 


Exercises 


A.l. Prove that linear programs and are dual to each other. 


A.2. Prove [Theorem 17.1.1| using linear programming duality and the Birkhoff — 
von Neumann Theorem (Exercise 3.4). 
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Some useful probability tools 


B.1. The second moment method 


LEMMA B.1.1. Let X be a nonnegative random variable. Then 
(E[X])? 
E[X?] ` 
PRooF. The lemma follows from this version of the Cauchy-Schwarz inequality: 
(E[XY])’ < E[xX’] E[Y’]. (B.1) 
Applying to X and Y = 1 x>o, we obtain 
(E[X])* < E[X°] E[Y*] = E[X°] P(X > 0). 
Finally, we Without loss of generality E [x 2] and E [Y?] are both positive. 
[X?] 


Letting U = X/\/E and V = Y/,/E[Y?] and using the fact that 2UV < U? + V?, we 
obtain 


P(X >0) > 


2E[UV] < E[U] + E[V7] =2. 
Therefore, 
(E[UV])* <1, 


which is equivalent to (B.1). 


B.2. The Hoeffding-Azuma Inequality 


Lemma B.2.1 (Hoeffding Lemma [Hoe63]). Suppose that X is a random variable with 
distribution F such thata < X < a+ 1 for some a < 0 andE[X] = 0. Then for any 
AER, 

E [e>] ee 


ProoF. This proof is from |BLM13}. Let 
W(A) = logE Gul 


vA) = a = frar, 


Observe that 


where 
(ear 
Fy(u) = [= odr" 
Also, 
E [e*]E [X?e**] - (E [Xe**])” 
(E [e>x])? 


- [ear a (fram) = Var(X)), 


342 


Y” (A) ae 


where X, has law F). 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


B.2. THE HOEFFDING-AZUMA INEQUALITY 343 
For any random variable Y with a < Y < a + 1, we have 
2 
1 1 
Y-a-— <=, 
2 4 


[W (A)| < 1/4 
for all A. Since ¥(0) = W’(0) = 0, it follows that |W’(A)| < iN and thus 


A 6 | y2 
—d0| = — 
[s 
for all À. 


THEOREM B.2.2 (Hoeffding-Azuma Inequality ). Let St = $i; Xi be a mar- 
tingale; i.e., E [St+1| Hi] = S+ where H;i = (X1, Xo,..., X+) represents the history. If all 
|X¢| < 1, then 


Var(Y) < E 


In particular, 


D(A) < 


P [S; > R] < e77, 


: en ; ; 
PROOF. Since -į < = < E, the previous lemma gives 


(2a)? 2 
E [ee || <e 8 = eò [2 


so 
2 
E [er || = eE [e>] <el Peest, 


Taking expectations, 


E [es] < oR er 
so by induction on t 
E [e>] < et? /2. 
Finally, by Markov’s Inequality, 
P[S:>R]=P lens > a” < eR et? /2_ 
Optimizing, we choose A = R/t, so 


P[S > R] < e7, 


Licensed to AMS. 
License or copyright restrictions may apply to redistribution; see http://www.ams.org/publications/ebooks/terms 


APPENDIX C 


Convex functions 


We review some basic facts about convex functions: 
(1) A function f : [a,b] > R is convex if for all x, z € [a,b] and a € (0,1) we have 


flax + (1—a)z) < af(e) + (1—a)f(2). (C.1) 


z ar+(l-a)z z 
FIGURE C.1. A convex function f. 


(2) The definition implies that the supremum of any family of convex functions is 
convex. 


f(a) = sup(fi(), fala), fale) 


FIGURE C.2. The supremum of three convex functions. 


344 
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(3) For x < y in [a,b] denote by S(x,y) = f= fa) the slope of f on [x,y]. Con- 
vexity of f is equivalent to the inequality 


S(x,y) < S(y, z) 


holding for all x < y < z in [a,b]. 

(4) For z < y < z, the inequality in is equivalent to S(x,y) < S(«#,z) and to 
S(x,z) < S(y,z). Thus, for f convex in [a,b], the slope S(x,y) is (weakly) 
monotone increasing in x and in y as long as x,y are in [a,b]. This implies 
continuity of f in (a,b). 

(5) It follows from and the Mean Value Theorem that if f is continuous in [a, }] 
and has a (weakly) increasing derivative in (a,b), then f is convex in [a,b]. 

(6) The monotonicity in (4) implies that a convex function f in [a, b] has an increas- 
ing right derivative f/_ in [a, b) and an increasing left derivative f- in (a, b]. Since 
fi.(x) < f(y) for any x < y, we infer that f is differentiable at every point of 
continuity in (a,b) of fi. 

(7) Since increasing functions can have only countably many discontinuities, a con- 
vex function is differentiable with at most countably many exceptions. The 
convex function f(x) = 37,5, |£ — 1/n|/n? indeed has countably many points 
of nondifferentiability. 7 

(8) Definition: We say that s € R is a subgradient of f at x if 


fy)2f@)+s-(y—-2) Vy € [a,b]. (C.2) 
The right-hand side as a function of y is called a supporting line of f at zx. 


See|Figure C.3]and|Figure C.4 


(9) If s(t) is a subgradient of f at t for each t € [a,b], then 
s(y)(x@—y) < f(x) — f(y) < s(@)(@-y) Va,y € [a,b]. (C.3) 


These inequalities imply that s(-) is weakly increasing and f(-) is continuous on 
la, b]. 

(10) Fact: Let f : [a,b] + R be any function. Then f has a subgradient for all 
x € [a,b] if and only if f is convex and continuous on [a, b]. 
Proof: 
=>: f is the supremum of affine functions (the supporting lines). Continuity 
at the endpoints follows from the existence of a subgradient at these points. 
<=: By (4), any s € [fL (x), f4 (x)] is a subgradient. 

(11) Proposition: If s(x) is a subgradient of f at x for every x € [a,b], then 


f(t) = f(a) +f s(x) dx Vt € [a,b]. 


Proof: By translation, we may assume that a = 0. Fix t € (0,b] and n > 1. 
Define 


kt 
tk = Sy 
n 


For x € [tx-1, tr) , define 

On(@) = s(te-1) and hy(x) = s(tr). 
Then gn(-) < s(-) < An(-) in [0, t), so 

Í meji l oe | meats (c.4) 
By|(C.3) 


s(tk) Wk € [1,n]. 


silo 


T s(tu_1) < f (te) — f(te-1) < 
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Summing over k € [1,n] yields 
f goto) de < 40) ~ 10) f hra(e) a (C5) 

Direct T gives that i 
f hn (x) dx — T gn(x) dx = [s(t) — s(0)] 
so by a (C-5p, we oe that 
f(t) — 410) — f a(e) aa 


Taking n —> co completes the proof. 


Silo 


FEA + HAE- t) 


f(t) 


FIGURE C.3. The line @;(-) is a supporting line at t and f(t) is a subgradient 
of f at t. 


f(t) 


FIGURE C.4. A collection of supporting lines at t. 
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fp) + 


TOt O- OVI O ea a : 
Kad + 


FIGURE C.5 


(12) Jensen’s inequality: If f : [a,b] —> R is convex and X is a random variable 
taking values in [a,b], then f(E [X]) < E[f(X)]. (Note that for X taking just 
two values, this is the definition of convexity.) 

Proof: Let €(-) be a supporting line for f at E [X]. Then by linearity of expec- 
tation, 
FŒ [X]) = LŒ [X]) = E[(X)] < E [f(X)]. 

(13) The definition of convex functions extends naturally to any function de- 
fined on a convex set K in a vector space. Observe that the function f : K > R 
is convex if and only if for any x,y € K, the function 

W(t) = flix + (1 — dy) 
is convex on [0,1]. It follows that 

U(1) > YO) + Y, (0); 
i.e., for all x,y € K, 


i) 2 Hy) + Vi) (x-y). (C.6) 
A vector v € R” is a subgradient of a convex function f : R” > R at y if 
for all x 


f(x) 2 fly) +v- (x-y). 
If f is differentiable at y, then the only subgradient is V f(y). 
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Solution sketches for selected exercises 


Chapter 


Consider the betting game with the following payoff matrix: 


player II 
L R 
W rTlo 2 
E 
&IB|5 1 


Draw graphs for this game analogous to those shown in |Figure 2.2| and 
determine the value of the game. 


Solution sketch. 


Suppose that player I plays T with probability zı and B with probability 1 — 
xı, and suppose that player II plays L with probability yı and R with probability 
1 — yı. (We note that in this game there is no saddle point.) 


o 


Worst-case 


5 — 5a, : when player II logs 


plays L 
Expected 
gain 

of player I 


Expected 
loss 


4yı + 1: when player I 
of player II 


1 + zı: when player II plays B 


plays R 


w 


2 
Worst-case 2 — 2y,: when player I 
gain plays T 
Tı 

2/3 1 0 1/6 
Player I’s mixed strategy 


yı 


1 
Player I’s mixed strategy 


FIGURE D.1. The left side of the figure shows the worst-case expected gain of 
player I as a function of her mixed strategy (where she plays T with probability 
xı and B with probability 1— x1). This worst-case expected gain is maximized 
when she plays T with probability 2/3 and B with probability 1/3. The 
right side of the figure shows the worst-case expected loss of player II as a 
function of his mixed strategy when he plays L with probability yı and R 
with probability 1 — yı. The worst-case expected loss is minimized when he 
plays L with probability 1/6 and R with probability 5/6. 
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Reasoning from player I’s perspective, her expected gain is 2(1 — y2) for 
playing the pure strategy T, and 4y2 + 1 for playing the pure strategy B. Thus, 
if she knows y2, she will pick the strategy corresponding to the maximum of 
2(1 — y2) and 4y2 + 1. Player II can choose y2 = 1/6 so as to minimize this 
maximum, and the expected amount player II will pay player I is 5/3. This is 
the player II strategy that minimizes his worst-case loss. See for an 
illustration. 

From player II’s perspective, his expected loss is 5(1 — xı) if he plays the 
pure strategy L and 1+ xı if he plays the pure strategy R, and he will aim to 
minimize this expected payout. In order to maximize this minimum, player I will 
choose xı = 2/3, which again yields an expected gain of 5/3. 


Prove that if equation holds, then player I can safely ignore row i. 


Solution sketch. Consider any mixed strategy x for player I, and use it to con- 
struct a new strategy z in which z; = 0, ze = £e + bexi for £ € I, and Zk = £p for 
k g IU {i}. Then, against player II’s j-th strategy 


(2 A x" A); = So (x + Bex; — ©e)ae; — Litij > 0. 
tel 


Two players each choose a number in [0, 1]. If they choose the same number, the 
payoff is 0. Otherwise, the player that chose the lower number pays $1 to the 
player who chose the higher number, unless the higher number is 1, in which case 
the payment is reversed. Show that this game has no mixed Nash equilibrium. 
Show that the safety values for players I and II are —1 and 1 respectively. 


Solution sketch. Given a mixed strategy F € A for player I and given € > 0, find 
a point a such that F(a) > 1— e. Then taking G supported in [a,1) yields a 
payoff of at least 1 — e to player II. 


Given a 5 x 5 zero-sum game, such as the following, how would you quickly 
determine by hand if it has a saddle point: 


20 1 4 3 1 


Solution sketch. A simple approach is to “star” the maximum element in each 
column and underline the minimum element in each row. (If there is more than 
one, star/underline all of them.) Any element in the matrix that is both starred 
and underlined is a saddle point. In the example below, the 6 at position (3, 4) 
is a saddle point. 
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N 
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w 
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Aa 
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on 
D 
In 
wo 
N 


BIJ Consider a directed graph G = (V, E) with nonnegative weights wij on each edge 
(i,j). Let Wi = >), wij. Each player chooses a vertex, say i for player I and j 
for player II. Player I receives a payoff of wi; if i Æ j and loses W; — wii if i = j. 
Thus, the payoff matrix A has entries aij = Wij — lti=;} Wi. If n = 2 and the 
wij’s are all 1, this game is called Matching Pennies. 


e Show that the game has value 0. 


Solution sketch. D aij = 0 for all ¿, so by giving all vertices equal weight, 
player II can ensure a loss of at most 0. 
Conversely, for any strategy y € An for player II, player I can select action 
i with y; = ming yz, yielding a payoff of 


X ayy = D> wis(yy — ys) > 0. 
j j 


e Deduce that for some « € An, x7 A = 0. 


Solution sketch. By the Minimax Theorem, 3x € An with xA > 0. Since 
xA1 = 0, we must have xA = 0. 


[2.19] Prove that if set G C Rf is compact and H C R* is closed, then G + H is closed. 
(This fact is used in the proof of the Minimax Theorem to show that the set K 
is closed.) 

Solution sketch. Suppose that tn + yn —> z, where £n € G and yn € H for all n. 
Then there is a subsequence £n, — x € G; we infer that yn, —> z— x, whence 
z—-a€d. 


2.20} Find two sets F}, F> C R? that are closed such that Fı — F> is not closed. 


Solution sketch. Fı = {xy > 1}, Fo = {x = 0}, Fi + Fo = {ax > 0}. 


2.21] Consider a zero-sum game A and suppose that 7 and o are permutations of player 
Is strategies {1,...,m} and player II’s strategies {1,...,n}, respectively, such 
that 

An(i)o(j) = lij (D.1) 


for all i and j. Show that there exist optimal strategies x* and y* such that 


xi = XZ) for all i and yj = yzg) for all j. 
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Solution sketch. First, observe that there is an £ such that 7° is the identity 
permutation (since there must be k >r with n” = z", in which case £ = k — r). 
Let (1x); = Ux (i) and (ay); = Yo(j): 

Let U(x) = miny x” Ay. Since (rx)” A(oy) = x” Ay, we have U(x) = Y (rx) 
for all x E€ Am. Therefore, for all y € A, 


{2 T {2 
(5x) Ay > 72, (nx) = W(x). 
k=0 k=0 


hs : : é-1 
Thus, if x is optimal, so is x* = 7 Yokai ntx. Clearly mx* = x*. 


2.22] Player I chooses a positive integer x > 0 and player II chooses a positive integer 
y > 0. The player with the lower number pays a dollar to the player with the 
higher number unless the higher number is more than twice larger in which case 
the payments are reversed. 

1 ify <x < 2y or z < y/2, 
A(x,y)=4 —1 ifa<y<2rory<2/2, 
0 if x = y. 
Find the unique optimal strategy in this game. 


Solution sketch. 1 strictly dominates any x > 4, and 4 strictly dominates 3. Re- 
stricting to 1, 2, and 4, we get Rock-Paper-Scissors. 


Chapter 
Show that there is no pure Nash equilibrium, only a unique mixed one, and both 


commitment strategy pairs have the property that the player who did not make 
the commitment still gets the Nash equilibrium payoff. 


player II 


C D 


A Í (6—10) (0,10) 


player I 


B (4,1) (1,0) 


Solution sketch. In this game, there is no pure Nash equilibrium (one of the 
players always prefers another strategy, in a cyclic fashion). For mixed strategies, 
if player I plays (A, B) with probabilities (p, 1 — p) and player II plays (C, D) 
with probabilities (q, 1 — q), then the expected payoffs are 1 + 3q — p+ 3pq for 
player I and 10p + q — 21pq for player II. We easily get that the unique mixed 
equilibrium is p = 1/21 and q = 1/3, with payoffs 2 for player I and 10/21 for 
player II. 

If player I can make a commitment, then by choosing p = 1/21 — e for 
some small € > 0, she will make player II choose q = 1, and the payoffs will be 
4 + 2/21 — 2e for player I and 10/21 + 11e for player II. If player II can make a 
commitment, then by choosing q = 1/3 + £, he will make player I choose p = 1, 
and the payoffs will be 2+ 6e for player I and 10/3—11e for player II. Notice that 
in both of these commitment strategy pairs, the player who did not make the 
commitment gets a larger payoff than he does in the symmetric Nash equilibrium. 
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Chapter 

Show that any d-simplex in R? contains a ball. 
Solution sketch. The d-simplex Ao with vertices the origin and the standard ba- 
sis e1,...,@q¢ in R? contains the ball B(y, 4) where y := aq(e1 +- + ea). 
Given an arbitrary d-simplex A, by translation we may assume its vertices are 
0, V1;,..., Va. Let A be the square matrix with columns v; for i < d. Since these 


columns are linearly independent, A is invertible. Then A contains B(Ay, €), 
where e := min{||Ax|| such that ||x|| = 1/d} > 0. 


Let K C Rf be a compact convex set which contains a d-simplex. Show that K 
is homeomorphic to a closed ball. 
Solution sketch. Suggested steps: 


(i) By K contains a ball B(z,¢). By translation, assume without 
loss of generality that B(0,¢) C K. 


(ii) Show that p : R > R defined by 
p(x) =inf{r>0: Z EK} 


is subadditive (i.e., p(x + y) < p(x) + p(y)) and satisfies 


for all x. Deduce that p is continuous. 


Solution of step (ii): Suppose = € K and # € K, where r,s > 0. Then 


G+y r T s y 
r+s r+s r i r+s s 


? 


so p(x +y) <r+s. Therefore p(x +y) < p(x) + p(y), from which 


Similarly, 


(iii) Define 


for x # 0 and h(0) = 0 and show that h : K > B(0,1) is a homeomorphism. 


Chapter [6] 
Consider the zero-sum two-player game in which the game to be played is ran- 


domized by a fair coin toss. (This example was discussed in|§2.5.1|) If the toss 
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353 
comes up heads, the payoff matrix is given by A” , and if tails, it is given by AT: 


player II 


player II 
L R L R 
A” = and AT = 
sI UJ|8 2 5 |U}2 6 
F F 
Alple o0 &/D/4 10 


For each of the settings below, draw the Bayesian game tree, convert to 
normal form, and find the value of the game. 


(a) Suppose that player I is told the result of the coin toss and both players 
play simultaneously. 


(b) Suppose that player I is told the result of the coin toss, but she must reveal 
her move first. 


Solution sketch. 


(a) In what follows Uy (respectively, Ur) means that player I plays U if the coin 


toss is heads (respectively, tails), and Dy (respectively, Dr) means that the play 
I plays D if the coin toss is heads (respectively, tails). 


player II 


L R 


Uy,Ur |5 4 


Uy, Dr |6 6 


player I 


Dy,Ur | 4 3 


Dy, Dr} 3 5 


The value of the game is 6, since row 2 dominates all other rows. 


(b) A column strategy such as Ly, Lp means that that player II plays L regardless 
of what move player I reveals, whereas strategy Ly, Rp means that player II plays 
L if player I reveals U, but plays R if player I reveals D. 


player II 
Luy,Lp Lu,Rp Ruy,Rp Ru,Lp 
Un, Ur 5 5 4 4 
T | Un, Dr 6 9 6 3 
F 
A| Dy, Ur 4 1 3 6 
Dy, Dr 5 5 5 5 
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The value of the game is 5. Clearly the value of the game is at least 5, since 
player I can play the pure strategy Dy, Dr. To see that it is at most 5, observe 
that row 1 is dominated by row 4 for player I and column 1 is dominated by 
column 3 for player II. By playing Ly, Rp with probability 0.5 and Ru, Rp 
with probability 0.5, player II can ensure that player I’s payoff is at most 5. 


Chapter 
Consider the following symmetric game as played by two drivers, both trying 


to get from Here to There (or two computers routing messages along cables of 
different bandwidths). There are two routes from Here to There; one is wider, 
and therefore faster, but congestion will slow them down if both take the same 
route. Denote the wide route W and the narrower route N. The payoff matrix 
is 


player II (yellow) 


E w N 
P (3,3) (5,4) 
Ę 

Q 


N | (4,5) (2,2) 


Payoffs: 
2n 3 


Payoffs: 
2a 


FIGURE D.2. The left-most image shows the payoffs when both drivers drive 
on the narrower route, the middle image shows the payoffs when both drivers 
drive on the wider route, and the right-most image shows what happens when 
the red driver (player I) chooses the wide route and the yellow driver (player 
II) chooses the narrow route. 


Find all Nash equilibria and determine which ones are evolutionarily stable. 


Solution sketch. There are two pure Nash equilibria: (W, N) and (N, W). 

If player I chooses W with probability x, player II’s payoff for choosing W is 
3x + 5(1 — x), and for choosing N is 4x + 2(1 — x). Equating these, we get that 
the symmetric Nash equilibrium is when both players take the wide route with 
probability x = 0.75, resulting in an expected payoff of 3.5 for both players. 

Is this an evolutionarily stable equilibrium? Let x = (.75,.25) be our equilib- 
rium strategy. We already checked that x7 Ax = z" Ax for all pure strategies Z; 
we need only check that z” Az < x” Az. For z = (1,0), x” Az = 3.25 > z” Az = 3 
and for z = (0,1), x” Az = 4.25 > z Az = 2, implying that x is evolutionarily 
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stable. 


Chapter 


B7 Show that the price of anarchy bound for the market sharing game from[§8.3]can 
be improved to 2 — 1/k when there are k teams. Show that this bound is tight. 
Solution sketch. We know that S includes some top j cities and that S* covers the 
top k. For each i, we have ui(ci,c-i) > ME So $0, (k — j)ui(ci,e-i) > 
k(V(S*) — V(S)) or kV(S*) < (2k — j)V (S), so V(S*) < (2—1/k)V(S) since 
j2l. 


Consider an auctioneer selling a single item via a first-price auction: Each of 
the n bidders submits a bid, say b; for the it” bidder, and, given the bid vector 
b = (b1,...,bn), the auctioneer allocates the item to the highest bidder at a price 
equal to her bid. (The auctioneer employs some deterministic tie-breaking rule.) 
Each bidder has a value v; for the item. A bidder’s utility from the auction 

when the bid vector is b and her value is v; is 
: — bi, i wins the auction, 
ui[b|vi] = 

0, otherwise. 

Each bidder will bid in the auction so as to maximize her (expected) utility. The 
expectation here is over any randomness in the bidder strategies. The social 
surplus V(b) of the auction is the sum of the utilities of the bidders and the 
auctioneer revenue. Since the auctioneer revenue equals the winning bid, we have 


V(b) := value of winning bidder. 
Show that the price of anarchy is at most 1—1/e; that is, for b a Nash equilibrium, 


E [V(b)] > ( z 2) max vi. 


Hint: Consider what happens when bidder i deviates from b; to the distribution 
with density f(x) = 1/(vi — x), with support [0, (1 — 1/e)v]. 


Solution sketch. We first show that the price of anarchy is at most 1/2. Suppose 
the n bidders have values v1,...,Un and their bids, in Nash equilibrium, are 
b = (b1,...,6n). Without loss of generality, suppose that vı > wv; for all i. 
Consider what happens when bidder 1 deviates from bı to bï := vı /2. We have 


ui [blvi] > wi [6}, bile] > F — max bi, (D.1) 
and since u;[b|v;] > 0 for all i 4 1, we have 


On the other hand, 


J Ui blvi] = Vix — max bj, 
l 
i 


where i* is the winning bidder, so 
Vix > —. 
~ 2 
To extend this to 1 — 1/e, consider instead what happens when bidder 1 
deviates from bı to the distribution with density f(x) = 1/(v1 — x), with support 
[0, (1 — 1/e)v1]. 
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Let p := maxi>ı bi. Then instead of|(D.1)| we get 


(1-1/e)v1 


u1[b1, b_i|vi] > f 


Pp 


(vı — x) f(x) dx > € — 2) vı — max bi. 


As above, from this we conclude that the value of the winning bidder v;» satisfies 


1 
Ui* > ( = *) UL. 
e 
Chapter [11] 


EEI) Show that the Talmud rule is monotone in A for all n and coincides with the 
garment rule for n = 2. 


Solution sketch. The monotonicity of the Talmud rule follows from the mono- 
tonicity of CEA and CEL and the observation that if A = C/2, then a; = ci/2 
for all i. Thus, if A is increased from a value that is at most C/2 to a value that 
is above C'/2, then the allocation to each claimant i also increases from below 
ci/2 to above c;/2. 
Suppose that n = 2, and let cı < c2. We consider the two cases: 
e A<C/2: Then c > A. 
— If also cı > A, then (ai, a2) = (A/2, A/2). 
— Ifa <A, then a1 = 4 and a2 =A- 4 < r2. 
In both cases, this is the CEA rule with claims (2 
e A>C/2: Then cı < A. 
— If co > A, then a, = ci /2 and ag = A — 3, so fy 2 and f2 = 
cg -At+ om < a 
— If co < A, then ai = 4 + 4(A— c2) and ag = 24 i(A- a), so 
h =h = SA. 
In both cases, this is the CEL rule with claims (4, 2). 


Chapter 


BA Show that Instant Runoff violates the Condorcet winner criterion, ITA with pref- 
erence strengths, and cancellation of ranking cycles. 


Solution sketch. If 40% of the population has preference order A> B>C, 40% of 
the population has C> B>A, and 20% of the population has B> A>+C, then B is 
a Condorcet winner but loses an IRV election. To see that IRV violates IIA with 
preference strength, consider what happens when C is moved to the bottom for 
the second 40% group. To see that IRV violates cancellation of ranking cycles, 
suppose that 40% of the population has preference order A>B>C, 35% of the 
population has B>+C> A, and 25% of the population has C> A> B. Then A is the 
winner of IRV, but eliminating the 75% of the population that are in a ranking 
cycle will change the winner to B. 


Chapter 


14.5 Find a symmetric equilibrium in the war of attrition auction discussed in[§14.4.3] 
under the assumption that bids are committed to up-front, rather than in the 
more natural setting where a player’s bid (the decision as to how long to stay in) 
can be adjusted over the course of the auction. 
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Solution sketch. Let 6 be a symmetric strictly increasing equilibrium strategy. 
The expected payment p(v) of an agent in a war-of-attrition auction in which all 
bidders use ( is 
po) = FOE | max, B) 
Equating this with p(v) from (14.9), we have 
f (Fw)? — F(w)"™)dw = f B(w)(n — 1) F(w)"~? f(w)dw + (1 — F(v)"~")B(v). 
0 0 


Differentiating both sides with respect to v, cancelling common terms, and sim- 
plifying yields 


max, Vi < e] +0- Foo). 


(n = IvF(v)"~ f(v) 


Po = ye 


and hence 


? (n = 1)wF(w)"? f(w) 
= dw. 
p) I 1- F(w) m! i 
For two players with F uniform on [0, 1] this yields 


aw) = f Y dw =—v log(1 — v). 


14.13} Determine the explicit payment rule for the three tie-breaking rules just discussed. 


Solution sketch. Fix a bidder, say 1. We consider all the possibilities for when 
bidder 1 might win and what his payment is in each case. Suppose that 


p = max tps (bi) 
is attained k times by bidders i > 2. Let 


[v- (p), v+(~)] = {b : yıb) = p} 
and 
bx = max{b; H wi (bi) = f, i > 2}. 
e Tie-breaking by bid: 
— If ¥1(b1) > », then bidder 1 wins and pays max{b., v_(y)}. 
— If yı(bı) = y and bı is largest among those with virtual valuation at 
least y, then bidder 1 wins and pays max{b., v_(y)}. 
e Tie-breaking according to a fixed ordering of bidders: 
— If wi(b1) = p and bidder 1 wins (has the highest rank), then his 
payment is v_(y). 
— If y1(b1) > p, then his payment is v_(y) if he has the highest rank 
and v+ (p) otherwise. 
e Random tie-breaking: 
— If ¥1(b1) = y, then bidder 1 wins with probability 
1 wins, he is charged v_(y). 
— If yı(bı) > y, then bidder 1 wins, and he is charged 
1 k 
TTT 
1 


because in 77 of the permutations he will be ranked above the other k 
bidders with virtual value y. 


and if bidder 


ak, 
k+1? 


U+ (p), 
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14.14} Consider two bidders where bidder 1’s value is drawn from an exponential dis- 
tribution with parameter 1 and bidder 2’s value is drawn independently from 
Unif[0, 1]. What is the Myerson optimal auction in this case? Show that if 
(v1, v2) = (1.5, 0.8), then bidder 2 wins. 


Solution sketch. We first compute the virtual value functions: 
l-—v 
yı (v1) = U1 1 and w2(v2) = v2 I E 2v2 1. 
Thus, bidder 1 wins when vı — 1 > max(0, 2v2 — 1), whereas bidder 2 wins when 
2v2 — 1 > max(0,vı — 1). If (v1, v2) = (1.5,0.8), then bidder 2 wins and pays 
wz ‘(yu (1.5)) = 0.75. This shows that in the optimal auction with non-i.i.d. bid- 
ders, the bidder with the highest value might not win. 


Chapter 
[18-7] (a) For Y a normal random variable, N (0,1), show that 


ve, 


2 2 
e7 T U+) < P[Y >y] <e7 > as yoo. 
(b) Suppose that Yi,..., Yn are iid. N(0,1) random variables. Show that 


E E x = /2logn(1+0(1)) as n> oœ. (D.1) 


1l<i<n 


Solution sketch. 


(a) 


yt 2 
and f en? !? de >e Ba : 
Y 
(b) Let Mn = E [maxı<i<n Yi]. Then by a union bound 


T —(logn+z) _ „—z 
P |Mn > ./2logn + —— | < ne =e”. 
| B . n = 


On the other hand, 
P |r: >y 2a logn] = note) | 
so P [Mn > V2alogn| = (1 — pR >0 fora<l. 
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